Python APIs created for this project

Annotation module

For the purpose of annotating RNA types for genomic regions.

Annotation.overlap(bed1, bed2)

This function compares overlap of two Bed object from same chromosome

Parameters:
Returns:

boolean – True or False

Example:

>>> from xplib.Annotation import Bed
>>> from Annotation import overlap
>>> bed1=Bed(["chr1",10000,12000])
>>> bed2=Bed(["chr1",9000,13000])
>>> print overlap(bed1,bed2)
True
Annotation.Subtype(bed1, genebed)

This function determines intron or exon or utr from a BED12 file.

Parameters:
  • bed1

    A Bed object defined by xplib.Annotation.Bed (BAM2X)

  • genebed – A Bed12 object representing a transcript defined by xplib Annotaton.Bed with information of exon/intron/utr from an BED12 file
Returns:

str – RNA subtype. “intron”/”exon”/”utr3”/”utr5”/”.”

Example:

>>> from xplib.Annotation import Bed
>>> from xplib import DBI
>>> from Annotation import Subtype
>>> bed1=Bed(["chr13",40975747,40975770])
>>> a=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed")
>>> genebed=a.query(bed1).next()
>>> print Subtype(bed1,genebed)
"intron"
Annotation.annotation(bed, ref_allRNA, ref_detail, ref_repeat)

This function is based on overlap() and Subtype() functions to annotate RNA type/name/subtype for any genomic region.

Parameters:
  • bed

    A Bed object defined by xplib.Annotation.Bed (in BAM2X).

  • ref_allRNA – the DBI.init object (from BAM2X) for bed6 file of all kinds of RNA
  • ref_detail

    the DBI.init object for bed12 file of lincRNA and mRNA with intron, exon, UTR

  • ref_detail

    the DBI.init object for bed6 file of mouse repeat

Returns:

list of str – [type,name,subtype]

Example:

>>> from xplib.Annotation import Bed
>>> from xplib import DBI
>>> from Annotation import annotation
>>> bed=Bed(["chr13",40975747,40975770])
>>> ref_allRNA=DBI.init("../../Data/all_RNAs-rRNA_repeat.txt.gz","bed")
>>> ref_detail=DBI.init("../../Data/Ensembl_mm9.genebed.gz","bed")
>>> ref_repeat=DBI.init("../../Data/mouse.repeat.txt.gz","bed")
>>> print annotation(bed,ref_allRNA,ref_detail,ref_repeat)
["protein_coding","gcnt2","intron"]

“annotated_bed” data class

class data_structure.annotated_bed(x=None, **kwargs)

To store, compare, cluster for the genomic regions with RNA annotation information. Utilized in the program Select_stronginteraction_pp.py

Cluster(c)

Store cluster information of self object

Parameters:c – cluster index

Example:

>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770)
>>> a.Cluster(3)
>>> print a.cluster
3

Note

a.cluster will be the count information when a become a cluster object in Select_stronginteraction_pp.py

Update(S, E)

Update the upper and lower bound of the cluster after adding segments using Union-Find.

Parameters:
  • S – start loc of the newly added genomic segment
  • E – end loc of the newly added genomic segment

Example:

>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770)
>>> a.Update(40975700,40975800)
>>> print a.start, a.end
40975700 40975800
__init__(x=None, **kwargs)

Initiation example:

>>> str="chr13  40975747        40975770        +       ATTAAG...TGA    protein_coding  gcnt2   intron"
>>> a=annotated_bed(str)
or
>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770,strand='+',type="protein_coding",)
__lt__(other)

Compare two objects self and other when they are not overlapped

Parameters:other – another annotated_bed object
Returns:boolean – “None” if overlapped.

Example:

>>> a=annotated_bed(chr="chr13",start=40975747,end=40975770)
>>> b=annotated_bed(chr="chr13",start=10003212,end=10005400)
>>> print a>b
False
__str__()

Use print function to output the cluster information (chr, start, end, type, name, subtype,cluster)

Example:

>>> str="chr13  40975747        40975770        +       ATTAAG...TGA    protein_coding  gcnt2   intron"
>>> a=annotated_bed(str)
>>> a.Cluster(3)
>>> a.Update(40975700,40975800)
>>> print a
"chr13  40975700        40975800        protein_coding  gcnt2   intron  3"
overlap(other)

Find overlap between regions

Parameters:other – another annotated_bed object
Returns:boolean

“RNAstructure” class

class RNAstructure.RNAstructure(exe_path=None)

Interface class for RNAstructure executable programs.

DuplexFold(seq1=None, seq2=None, dna=False)

Use “DuplexFold” program to calculate the minimum folding between two input sequences

Parameters:
  • seq1,seq2 – two DNA/RNA sequences as string, or existing fasta file name
  • dna – boolean input. Specify then DNA parameters are to be used
Returns:

minimum binding energy, (unit: kCal/Mol)

Example:

>>> from RNAstructure import RNAstructure
>>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
>>> seq1 = "TAGACTGATCAGTAAGTCGGTA"
>>> seq2 = "GACTAGCTTAGGTAGGATAGTCAGTA"
>>> energy=RNA_prog.DuplexFold(seq1,seq2)
>>> print energy
Fold(seq=None, ct_name=None, sso_file=None, Num=1)

Use “Fold” program to predict the secondary structure and output dot format.

Parameters:
  • seq – one DNA/RNA sequence as string, or existing fasta file name
  • ct_name – specify to output a ct file with this name, otherwise store in temp, default: None
  • sso_file – give a single strand offset file, format see http://rna.urmc.rochester.edu/Text/File_Formats.html#Offset
  • Num – choose Num th predicted structure
Returns:

dot format of RNA secondary structure and RNA sequence.

Example:

>>> from RNAstructure import RNAstructure
>>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
>>> seq = "AUAUAAUUAAAAAAUGCAACUACAAGUUCCGUGUUUCUGACUGUUAGUUAUUGAGUUAUU"
>>> sequence,dot=RNA_prog.Fold(seq)
>>> assert(seq==sequence)
__init__(exe_path=None)

Initiation of object

Parameters:exe_path – the folder path of the RNAstructure executables

Example:

>>> from RNAstructure import RNAstructure
>>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
scorer(ct_name1, ct_name2)

Use ‘scorer’ pogram to compare a predicted secondary structure to an accepted structure. It calculates two quality metrics, sensitivity and PPV

Parameters:
  • ct_name1 – The name of a CT file containing predicted structure data.
  • ct_name2 – The name of a CT file containing accepted structure data, can only store one structure.
Returns:

sensitivity, PPV, number of the best predicted structure.

Example:

>>> ct_name1 = "temp_prediction.ct"
>>> ct_name2 = "temp_accept.ct"
>>> from RNAstructure import RNAstructure
>>> RNA_prog = RNAstructure(exe_path="/home/yu68/Software/RNAstructure/exe/")
>>> sensitivity, PPV, Number = RNA_prog.scorer(ct_name1,ct_name2)

Interface class for RNAstructure executable programs.

RNAstructure.dot2block(dot_string, name='Default')

convert dot format of RNA secondary structure into several linked blocks

Parameters:
  • dot_string – the dot format of RNA secondary structure
  • name – name of the RNA
Returns:

A list of all stems, each stem is a dictionary with ‘source’ and ‘target’

Example:

>>> from RNAstructure import dot2block
>>> stems = dot2block("(((((...)))...(((...)))..))","RNA_X")
>>> print stems
[{'source': {'start': 2, 'chr': 'test', 'end': 4}, 'target': {'start': 8, 'chr': 'test', 'end': 10}}, {'source': {'start': 14, 'chr': 'test', 'end': 16}, 'target': {'start': 20, 'chr': 'test', 'end': 22}}, {'source': {'start': 0, 'chr': 'test', 'end': 1}, 'target': {'start': 25, 'chr': 'test', 'end': 26}}]    
comments powered by Disqus