Python APIs created for this project¶
Annotation module¶
For the purpose of annotating RNA types for genomic regions.
- Annotation.overlap(bed1, bed2)¶
This function compares overlap of two Bed object from same chromosome
Parameters: - bed1 – A Bed object from xplib.Annotation.Bed (BAM2X)
- bed2 –
A Bed object from xplib.Annotation.Bed (BAM2X)
Returns: boolean – True or False
Example:
>>> from xplib.Annotation import Bed >>> from Annotation import overlap >>> bed1=Bed(["chr1",10000,12000]) >>> bed2=Bed(["chr1",9000,13000]) >>> print overlap(bed1,bed2) True
- Annotation.Subtype(bed1, genebed)¶
This function determines intron or exon or utr from a BED12 file.
Parameters: - bed1 –
A Bed object defined by xplib.Annotation.Bed (BAM2X)
- genebed – A Bed12 object representing a transcript defined by xplib Annotaton.Bed with information of exon/intron/utr from an BED12 file
Returns: str – RNA subtype. “intron”/”exon”/”utr3”/”utr5”/”.”
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from Annotation import Subtype >>> bed1=Bed(["chr13",40975747,40975770]) >>> a=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed") >>> genebed=a.query(bed1).next() >>> print Subtype(bed1,genebed) "intron"
- bed1 –
- Annotation.annotation(bed, ref_allRNA, ref_detail, ref_repeat)¶
This function is based on overlap() and Subtype() functions to annotate RNA type/name/subtype for any genomic region.
Parameters: - bed –
A Bed object defined by xplib.Annotation.Bed (in BAM2X).
- ref_allRNA – the DBI.init object (from BAM2X) for bed6 file of all kinds of RNA
- ref_detail –
the DBI.init object for bed12 file of lincRNA and mRNA with intron, exon, UTR
- ref_detail –
the DBI.init object for bed6 file of mouse repeat
Returns: list of str – [type,name,subtype]
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from Annotation import annotation >>> bed=Bed(["chr13",40975747,40975770]) >>> ref_allRNA=DBI.init("../../Data/all_RNAs-rRNA_repeat.txt.gz","bed") >>> ref_detail=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed") >>> ref_repeat=DBI.init("../../Data/mouse.repeat.txt.gz","bed") >>> print annotation(bed,ref_allRNA,ref_detail,ref_repeat) ["protein_coding","gcnt2","intron"]
- bed –
“annotated_bed” data class¶
- class data_structure.annotated_bed(x=None, **kwargs)¶
To store, compare, cluster for the genomic regions with RNA annotation information. Utilized in the program Select_stronginteraction_pp.py
- Cluster(c)¶
Store cluster information of self object
Parameters: c – cluster index Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> a.Cluster(3) >>> print a.cluster 3
Note
a.cluster will be the count information when a become a cluster object in Select_stronginteraction_pp.py
- Update(S, E)¶
Update the upper and lower bound of the cluster after adding segments using Union-Find.
Parameters: - S – start loc of the newly added genomic segment
- E – end loc of the newly added genomic segment
Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> a.Update(40975700,40975800) >>> print a.start, a.end 40975700 40975800
- __init__(x=None, **kwargs)¶
Initiation example:
>>> str="chr13 40975747 40975770 + ATTAAG...TGA protein_coding gcnt2 intron" >>> a=annotated_bed(str) or >>> a=annotated_bed(chr="chr13",start=40975747,end=40975770,strand='+',type="protein_coding",)
- __lt__(other)¶
Compare two objects self and other when they are not overlapped
Parameters: other – another annotated_bed object Returns: boolean – “None” if overlapped. Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> b=annotated_bed(chr="chr13",start=10003212,end=10005400) >>> print a>b False
- __str__()¶
Use print function to output the cluster information (chr, start, end, type, name, subtype,cluster)
Example:
>>> str="chr13 40975747 40975770 + ATTAAG...TGA protein_coding gcnt2 intron" >>> a=annotated_bed(str) >>> a.Cluster(3) >>> a.Update(40975700,40975800) >>> print a "chr13 40975700 40975800 protein_coding gcnt2 intron 3"
- overlap(other)¶
Find overlap between regions
Parameters: other – another annotated_bed object Returns: boolean
“RNAstructure” class¶
- class RNAstructure.RNAstructure(exe_path=None)¶
Interface class for RNAstructure executable programs.
- DuplexFold(seq1=None, seq2=None, dna=False)¶
Use “DuplexFold” program to calculate the minimum folding between two input sequences
Parameters: - seq1,seq2 – two DNA/RNA sequences as string, or existing fasta file name
- dna – boolean input. Specify then DNA parameters are to be used
Returns: minimum binding energy, (unit: kCal/Mol)
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq1 = "TAGACTGATCAGTAAGTCGGTA" >>> seq2 = "GACTAGCTTAGGTAGGATAGTCAGTA" >>> energy=RNA_prog.DuplexFold(seq1,seq2) >>> print energy
- Fold(seq=None, ct_name=None, sso_file=None, Num=1)¶
Use “Fold” program to predict the secondary structure and output dot format.
Parameters: - seq – one DNA/RNA sequence as string, or existing fasta file name
- ct_name – specify to output a ct file with this name, otherwise store in temp, default: None
- sso_file – give a single strand offset file, format see http://rna.urmc.rochester.edu/Text/File_Formats.html#Offset
- Num – choose Num th predicted structure
Returns: dot format of RNA secondary structure and RNA sequence.
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq = "AUAUAAUUAAAAAAUGCAACUACAAGUUCCGUGUUUCUGACUGUUAGUUAUUGAGUUAUU" >>> sequence,dot=RNA_prog.Fold(seq) >>> assert(seq==sequence)
- __init__(exe_path=None)¶
Initiation of object
Parameters: exe_path – the folder path of the RNAstructure executables Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
- scorer(ct_name1, ct_name2)¶
Use ‘scorer’ pogram to compare a predicted secondary structure to an accepted structure. It calculates two quality metrics, sensitivity and PPV
Parameters: - ct_name1 – The name of a CT file containing predicted structure data.
- ct_name2 – The name of a CT file containing accepted structure data, can only store one structure.
Returns: sensitivity, PPV, number of the best predicted structure.
Example:
>>> ct_name1 = "temp_prediction.ct" >>> ct_name2 = "temp_accept.ct" >>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> sensitivity, PPV, Number = RNA_prog.scorer(ct_name1,ct_name2)
Interface class for RNAstructure executable programs.
- RNAstructure.dot2block(dot_string, name='Default')¶
convert dot format of RNA secondary structure into several linked blocks
Parameters: - dot_string – the dot format of RNA secondary structure
- name – name of the RNA
Returns: A list of all stems, each stem is a dictionary with ‘source’ and ‘target’
Example:
>>> from RNAstructure import dot2block >>> stems = dot2block("(((((...)))...(((...)))..))","RNA_X") >>> print stems [{'source': {'start': 2, 'chr': 'test', 'end': 4}, 'target': {'start': 8, 'chr': 'test', 'end': 10}}, {'source': {'start': 14, 'chr': 'test', 'end': 16}, 'target': {'start': 20, 'chr': 'test', 'end': 22}}, {'source': {'start': 0, 'chr': 'test', 'end': 1}, 'target': {'start': 25, 'chr': 'test', 'end': 26}}]