Python APIs created for this project¶
Annotation module¶
For the purpose of annotating RNA types for genomic regions.
- AnnoMax.overlap(bed1, bed2)¶
This function compares overlap of two Bed object from same chromosome
Parameters: - bed1 – A Bed object from xplib.Annotation.Bed (BAM2X)
- bed2 –
A Bed object from xplib.Annotation.Bed (BAM2X)
Returns: boolean – True or False
Example:
>>> from xplib.Annotation import Bed >>> from AnnoMax import overlap >>> bed1=Bed(["chr1",10000,12000]) >>> bed2=Bed(["chr1",9000,13000]) >>> print overlap(bed1,bed2) True
- AnnoMax.Subtype(bed1, genebed, typ)¶
This function determines intron or exon or utr from a BED12 file.
Parameters: - bed1 –
A Bed object defined by xplib.Annotation.Bed (BAM2X)
- genebed – A Bed12 object representing a transcript defined by xplib Annotaton.Bed with information of exon/intron/utr from an BED12 file
Returns: str – RNA subtype. “intron”/”exon”/”utr3”/”utr5”/”.”
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from AnnoMax import Subtype >>> bed1=Bed(["chr13",40975747,40975770]) >>> a=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed") >>> genebed=a.query(bed1).next() >>> print Subtype(bed1,genebed) "intron"
- bed1 –
- AnnoMax.optimize_annotation(c_dic, bed, ref_detail)¶
This function will select an optimized annotation for the bed region from the genes in c_dic.
It will select the annotation based on a list of priorities. The list of priorities is: exon/utr of coding transcript > small RNA > exon of lincRNA > small RNA > exon/utr of nc transcript > intron of mRNA > intron of lincRNA. Genes on the same strand as the read(ProperStrand) will always have higher priority than those on the opposite strand (NonProperStrand). Repeat elements have the lowest priority (except rRNA_repeat according to the annotation files)
- AnnoMax.annotation(bed, ref_allRNA, ref_detail, ref_repeat)¶
This function is based on overlap() and optimize_annotation() and Subtype() functions to annotate RNA type/name/subtype for any genomic region. This function will first find genes with maximum overlap with bed, and use the function optimize_annotation to select an optimized annotation for the bed with following steps:
- Find hits (genes) with overlaps larger than Perc_overlap of the bed region length and build dic
- Find hits (genes) with overlaps between (Perc_max * max_overlap, max_overlap) and build P_dic (for ProperStrand), N_dic (for NonProperStrand).
- Find an annotation for the bed region among the hits.
Parameters: - bed –
A Bed object defined by xplib.Annotation.Bed (in BAM2X).
- ref_allRNA – the DBI.init object (from BAM2X) for bed6 file of all kinds of RNA
- ref_detail –
the DBI.init object for bed12 file of lincRNA and mRNA with intron, exon, UTR
- ref_detail –
the DBI.init object for bed6 file of mouse repeat
Returns: list of str – [type,name,subtype, strandcolumn]
Example:
>>> from xplib.Annotation import Bed >>> from xplib import DBI >>> from AnnoMax import annotation >>> bed=Bed(["chr13",40975747,40975770]) >>> ref_allRNA=DBI.init("all_RNAs-rRNA_repeat.txt.gz","bed") >>> ref_detail=DBI.init("Data/Ensembl_mm9.genebed.gz","bed") >>> ref_repeat=DBI.init("Data/mouse.repeat.txt.gz","bed") >>> print annotation(bed,ref_allRNA,ref_detail,ref_repeat) ["protein_coding","gcnt2","intron","ProperStrand"]
“annotated_bed” data class¶
- class data_structure.annotated_bed(x=None, **kwargs)¶
To store, compare, cluster for the genomic regions with RNA annotation information. Utilized in the program Select_stronginteraction_pp.py
- Cluster(c)¶
Store cluster information of self object
Parameters: c – cluster index Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> a.Cluster(3) >>> print a.cluster 3
Note
a.cluster will be the count information when a become a cluster object in Select_stronginteraction_pp.py
- Update(S, E)¶
Update the upper and lower bound of the cluster after adding segments using Union-Find.
Parameters: - S – start loc of the newly added genomic segment
- E – end loc of the newly added genomic segment
Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> a.Update(40975700,40975800) >>> print a.start, a.end 40975700 40975800
- __init__(x=None, **kwargs)¶
Initiation example:
>>> str="chr13 40975747 40975770 + ATTAAG...TGA protein_coding gcnt2 intron" >>> a=annotated_bed(str) or >>> a=annotated_bed(chr="chr13",start=40975747,end=40975770,strand='+',type="protein_coding",)
- __lt__(other)¶
Compare two objects self and other when they are not overlapped
Parameters: other – another annotated_bed object Returns: boolean – “None” if overlapped. Example:
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770) >>> b=annotated_bed(chr="chr13",start=10003212,end=10005400) >>> print a>b False
- __str__()¶
Use print function to output the cluster information (chr, start, end, type, name, subtype,cluster)
Example:
>>> str="chr13 40975747 40975770 + ATTAAG...TGA protein_coding gcnt2 intron" >>> a=annotated_bed(str) >>> a.Cluster(3) >>> a.Update(40975700,40975800) >>> print a "chr13 40975700 40975800 protein_coding gcnt2 intron 3"
- overlap(other)¶
Find overlap between regions
Parameters: other – another annotated_bed object Returns: boolean
“RNAstructure” class¶
- class RNAstructure.RNAstructure(exe_path=None)¶
Interface class for RNAstructure executable programs.
- DuplexFold(seq1=None, seq2=None, dna=False)¶
Use “DuplexFold” program to calculate the minimum folding between two input sequences
Parameters: - seq1,seq2 – two DNA/RNA sequences as string, or existing fasta file name
- dna – boolean input. Specify then DNA parameters are to be used
Returns: minimum binding energy, (unit: kCal/Mol)
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq1 = "TAGACTGATCAGTAAGTCGGTA" >>> seq2 = "GACTAGCTTAGGTAGGATAGTCAGTA" >>> energy=RNA_prog.DuplexFold(seq1,seq2) >>> print energy
- Fold(seq=None, ct_name=None, sso_file=None, Num=1)¶
Use “Fold” program to predict the secondary structure and output dot format.
Parameters: - seq – one DNA/RNA sequence as string, or existing fasta file name
- ct_name – specify to output a ct file with this name, otherwise store in temp, default: None
- sso_file – give a single strand offset file, format see http://rna.urmc.rochester.edu/Text/File_Formats.html#Offset
- Num – choose Num th predicted structure
Returns: dot format of RNA secondary structure and RNA sequence.
Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> seq = "AUAUAAUUAAAAAAUGCAACUACAAGUUCCGUGUUUCUGACUGUUAGUUAUUGAGUUAUU" >>> sequence,dot=RNA_prog.Fold(seq) >>> assert(seq==sequence)
- __init__(exe_path=None)¶
Initiation of object
Parameters: exe_path – the folder path of the RNAstructure executables Example:
>>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
- scorer(ct_name1, ct_name2)¶
Use ‘scorer’ pogram to compare a predicted secondary structure to an accepted structure. It calculates two quality metrics, sensitivity and PPV
Parameters: - ct_name1 – The name of a CT file containing predicted structure data.
- ct_name2 – The name of a CT file containing accepted structure data, can only store one structure.
Returns: sensitivity, PPV, number of the best predicted structure.
Example:
>>> ct_name1 = "temp_prediction.ct" >>> ct_name2 = "temp_accept.ct" >>> from RNAstructure import RNAstructure >>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/") >>> sensitivity, PPV, Number = RNA_prog.scorer(ct_name1,ct_name2)
Interface class for RNAstructure executable programs.
- RNAstructure.dot2block(dot_string, name='Default')¶
convert dot format of RNA secondary structure into several linked blocks
Parameters: - dot_string – the dot format of RNA secondary structure
- name – name of the RNA
Returns: A list of all stems, each stem is a dictionary with ‘source’ and ‘target’
Example:
>>> from RNAstructure import dot2block >>> stems = dot2block("(((((...)))...(((...)))..))","RNA_X") >>> print stems [{'source': {'start': 2, 'chr': 'test', 'end': 4}, 'target': {'start': 8, 'chr': 'test', 'end': 10}}, {'source': {'start': 14, 'chr': 'test', 'end': 16}, 'target': {'start': 20, 'chr': 'test', 'end': 22}}, {'source': {'start': 0, 'chr': 'test', 'end': 1}, 'target': {'start': 25, 'chr': 'test', 'end': 26}}]