πŸ’» Π‘6: ΠŸΡ€ΠΎΠ½Π°ΠΎΡ“Π°ΡšΠ΅ Π½Π° ситС CDSΒΆ

Опис Π½Π° Π±Π°Ρ€Π°ΡšΠ΅

Π‘ΠΎ помош Π½Π° BioPython, ΠΏΡ€ΠΎΠ½Π°Ρ˜Π΄Π΅Ρ‚Π΅ Π³ΠΈ сСквСнциитС Π½Π° Ρ€Π°Π·Π»ΠΈΡ‡Π½ΠΈΡ‚Π΅ ΠΊΠΎΠ΄Π½ΠΈ Ρ€Π΅Π³ΠΈΠΎΠ½ΠΈ ΠΎΠ·Π½Π°Ρ‡Π΅Π½ΠΈ ΠΊΠ°ΠΊΠΎ (CDS). CDS сС Ρ€Π΅Π³ΠΈΠΎΠ½ΠΈΡ‚Π΅ Π΄ΠΎΠ±ΠΈΠ΅Π½ΠΈ послС ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€Π°Ρ‚Π° Π½Π° ΠΎΡ‚ΡΠ΅ΠΊΡƒΠ²Π°ΡšΠ΅ Π½Π° ΠΈΠ½Ρ‚Ρ€ΠΎΠ½ΠΈΡ‚Π΅.

from Bio import SeqIO
gene_record = SeqIO.read("yersinia-pestis-fasta/NC_005816.gb", "genbank")

Π’ΠΊΡƒΠΏΠ½Π°Ρ‚Π° Π΄ΠΎΠ»ΠΆΠΈΠ½Π° Π½Π° ДНА сСквСнцата Π΅:

print(len(gene_record.seq))
9609

Π”ΠΎΠ΄Π΅ΠΊΠ° нас Π½Π΅ интСрСсира Π΅Π»Π΅ΠΌΠ΅Π½Ρ‚ΠΎΡ‚ ΠΎΠ΄ gene_record.features, кој Π΅ листа ΠΎΠ΄ карактСристики ΠΎΠ΄ Π³ΠΎΠ»Π΅ΠΌΠΎ Π·Π½Π°Ρ‡Π΅ΡšΠ΅ Π·Π° описот Π½Π° самата сСквСнца. ΠžΡ‚ΠΊΠ°ΠΊΠΎ ќС сС Π·Π°ΠΏΠΎΡ‡Π½Π΅ со ΠΊΠΎΡ€ΠΈΡΡ‚Π΅ΡšΠ΅ Π½Π° ΠΎΠ²ΠΈΠ΅ сСквСнци, ΠΎΠ²Π° Π΅ Π΅Π΄Π΅Π½ Π²ΠΈΠ΄ Π½Π° ΠΎΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ˜Π° ΡˆΡ‚ΠΎ лСсно Π½ΠΈ ΠΎΠ·Π²ΠΎΠ·ΠΌΠΎΠΆΡƒΠ²Π° Π΄Π° Π΄ΠΎΠ±ΠΈΠ΅ΠΌΠ΅ β€œΠΏΠΎΠ°ΠΏΡΡ‚Ρ€Π°ΠΊΡ‚Π½Π°β€ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π° ΡˆΡ‚ΠΎ сС Π·Π½Π°Π΅ Π·Π° самата сСквСнца.

Π’ΠΊΡƒΠΏΠ½ΠΈΠΎΡ‚ Π±Ρ€ΠΎΡ˜ Π½Π° ΠΎΠ²ΠΈΠ΅ features ΠΌΠΎΠΆΠ΅ Π΄Π° Π³ΠΎ Π΄ΠΎΠ±ΠΈΠ΅ΠΌΠ΅ со:

print(len(gene_record.features))
41

БСкој Π΅Π΄Π΅Π½ feature ΠΈΠΌΠ° Π½Π΅ΠΊΠΎΠ»ΠΊΡƒ Π°Ρ‚Ρ€ΠΈΠ±ΡƒΡ‚ΠΈ, ΠΊΠ°ΠΊΠΎ Π½Π° ΠΏΡ€ΠΈΠΌΠ΅Ρ€ ΠΏΡ€Π²ΠΈΠΎΡ‚ ΠΎΠ΄ листата Π³Π»Π΅Π΄Π°ΠΌΠ΅ Π΄Π΅ΠΊΠ° ΠΈΠΌΠ° повСќС ΠΏΡ€ΠΎΠΌΠ΅Π½Π»ΠΈΠ²ΠΈ ΠΊΠΎΠΈ Π³ΠΎ ΠΎΠΏΠΈΡˆΠ²Π°Π°Ρ‚ ΠΌΠ΅Ρ“Ρƒ ΠΊΠΎΠΈ Π½Π°Ρ˜Π±ΠΈΡ‚Π½ΠΈ сС:

  • .type: Ρ‚ΠΈΠΏΠΎΡ‚ Π½Π° карактСристика (β€œCDS”, β€œgene”, …)

  • .location: Π»ΠΎΠΊΠ°Ρ†ΠΈΡ˜Π° Π½Π° самата сСквСнца, ΠΊΠ°ΠΊΠΎ Π²ΠΈΠ΄ ΠΌΠ°ΠΏΠΈΡ€Π°ΡšΠ΅ (ΠΏΠΎΡ‡Π΅Ρ‚ΠΎΠΊ:ΠΊΡ€Π°Ρ˜)

dir(gene_record.features[0])
['__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_flip',
 '_get_location_operator',
 '_get_ref',
 '_get_ref_db',
 '_get_strand',
 '_set_location_operator',
 '_set_ref',
 '_set_ref_db',
 '_set_strand',
 '_shift',
 'extract',
 'id',
 'location',
 'location_operator',
 'qualifiers',
 'ref',
 'ref_db',
 'strand',
 'translate',
 'type']

ΠŸΡ€ΠΈΠΌΠ΅Ρ€ ΠΎΠ΄ ΠΊΠ½ΠΈΠ³Π°Ρ‚Π°, ΠΈ интСрСсСн CDS Π΅ β€œpim” Π³Π΅Π½ΠΎΡ‚, YP_pPCP05 кој сС Π½Π°ΠΎΡ“Π° Π²ΠΎ сСквСнцата ΠΌΠ΅Ρ“Ρƒ Π±Π°Π·Π½ΠΈΡ‚Π΅ ΠΏΠ°Ρ€ΠΎΠ²ΠΈ [4342:4780]:

print(gene_record.features[21])
type: CDS
location: [4342:4780](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478716', 'GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']
    Key: note, Value: ['similar to many previously sequenced pesticin immunity protein entries of Yersinia pestis plasmid pPCP, e.g. gi| 16082683|,ref|NP_395230.1| (NC_003132) , gi|1200166|emb|CAA90861.1| (Z54145 ) , gi|1488655| emb|CAA63439.1| (X92856) , gi|2996219|gb|AAC62543.1| (AF053945) , and gi|5763814|emb|CAB531 67.1| (AL109969)']
    Key: product, Value: ['pesticin immunity protein']
    Key: protein_id, Value: ['NP_995571.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MGGGMISKLFCLALIFLSSSGLAEKNTYTAKDILQNLELNTFGNSLSHGIYGKQTTFKQTEFTNIKSNTKKHIALINKDNSWMISLKILGIKRDEYTVCFEDFSLIRPPTYVAIHPLLIKKVKSGNFIVVKEIKKSIPGCTVYYH']

Но, Π½ΠΈΠ΅ Ρ‚Ρ€Π΅Π±Π° Π΄Π° Π³ΠΈ најдСС ситС Π²Π°ΠΊΠ²ΠΈ CDS ΠΊΠΎΠ΄ΠΎΠ½ΠΈ ΠΈ ΠΎΠ²Π° ΠΌΠΎΠΆΠ΅ Π΄Π° Π³ΠΎ Π½Π°ΠΏΡ€Π°Π²ΠΈΠΌΠ΅ со Ρ‚ΠΎΠ° ΡˆΡ‚ΠΎ ќС Π±Π°Ρ€Π°ΠΌΠ΅ Π½ΠΈΠ· Ρ†Π΅Π»Π°Ρ‚Π° Π½ΠΈΠ·Π° ΠΎΠ΄ 41 features ΠΊΠ°Π΄Π΅ сС Π½Π°ΠΎΡ“Π°Π°Ρ‚ ΠΎΠ½ΠΈΠ΅ ΠΊΠΎΠΈ ΠΈΠΌΠ°Π°Ρ‚ Ρ‚ΠΈΠΏ β€œCDS”:

CDS_list = []

for i in range(0, len(gene_record.features)):
    if gene_record.features[i].type == "CDS":
        CDS_list.append(i)

print(f"Π‘Ρ€ΠΎΡ˜ Π½Π° ΠΏΡ€ΠΎΠ½Π°Ρ˜Π΄Π΅Π½ΠΈ CDS: {len(CDS_list)}")
Π‘Ρ€ΠΎΡ˜ Π½Π° ΠΏΡ€ΠΎΠ½Π°Ρ˜Π΄Π΅Π½ΠΈ CDS: 10

Π’Π°ΠΊΠ²ΠΈ Π³Π»Π΅Π΄Π°ΠΌΠ΅ Π΄Π΅ΠΊΠ° сС Π²ΠΊΡƒΠΏΠ½ΠΎ 10, ΠΈ Π½ΠΈΠ² ΠΌΠΎΠΆΠ΅ΠΌΠ΅ Π΄Π° Π³ΠΈ испринтамС:

for i in CDS_list:
    print(gene_record.features[i])
type: CDS
location: [86:1109](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478712', 'GeneID:2767718']
    Key: locus_tag, Value: ['YP_pPCP01']
    Key: note, Value: ['similar to corresponding CDS from previously sequenced pPCP plasmid of Yersinia pestis KIM (AF053945) and CO92 (AL109969), also many transposase entries for insertion sequence IS100 of Yersinia pestis. Contains IS21-like element transposase, HTH domain (Interpro|IPR007101)']
    Key: product, Value: ['putative transposase']
    Key: protein_id, Value: ['NP_995567.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MVTFETVMEIKILHKQGMSSRAIARELGISRNTVKRYLQAKSEPPKYTPRPAVASLLDEYRDYIRQRIADAHPYKIPATVIAREIRDQGYRGGMTILRAFIRSLSVPQEQEPAVRFETEPGRQMQVDWGTMRNGRSPLHVFVAVLGYSRMLYIEFTDNMRYDTLETCHRNAFRFFGGVPREVLYDNMKTVVLQRDAYQTGQHRFHPSLWQFGKEMGFSPRLCRPFRAQTKGKVERMVQYTRNSFYIPLMTRLRPMGITVDVETANRHGLRWLHDVANQRKHETIQARPCDRWLEEQQSMLALPPEKKEYDVHLDENLVNFDKHPLHHPLSIYDSFCRGVA']

type: CDS
location: [1105:1888](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478713', 'GeneID:2767716']
    Key: locus_tag, Value: ['YP_pPCP02']
    Key: note, Value: ['similar to corresponding CDS form previously sequenced pPCP plasmid of Yersinia pestis KIM (AF053945) and CO92 (AL109969), also many ATP-binding protein entries for insertion sequence IS100 of Yersinia pestis. Contains Chaperonin clpA/B (Interpro|IPR001270). Contains ATP/GTP-binding site motif A (P-loop) (Interpro|IPR001687, Molecular Function: ATP binding (GO:0005524)). Contains Bacterial chromosomal replication initiator protein, DnaA (Interpro|IPR001957, Molecular Function: DNA binding (GO:0003677), Molecular Function: DNA replication origin binding (GO:0003688), Molecular Function: ATP binding (GO:0005524), Biological Process: DNA replication initiation (GO:0006270), Biological Process: regulation of DNA replication (GO:0006275)). Contains AAA ATPase (Interpro|IPR003593, Molecular Function: nucleotide binding (GO:0000166))']
    Key: product, Value: ['transposase/IS protein']
    Key: protein_id, Value: ['NP_995568.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MMMELQHQRLMALAGQLQLESLISAAPALSQQAVDQEWSYMDFLEHLLHEEKLARHQRKQAMYTRMAAFPAVKTFEEYDFTFATGAPQKQLQSLRSLSFIERNENIVLLGPSGVGKTHLAIAMGYEAVRAGIKVRFTTAADLLLQLSTAQRQGRYKTTLQRGVMAPRLLIIDEIGYLPFSQEEAKLFFQVIAKRYEKSAMILTSNLPFGQWDQTFAGDAALTSAMLDRILHHSHVVQIKGESYRLRQKRKAGVIAEANPE']

type: CDS
location: [2924:3119](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478714', 'GeneID:2767717']
    Key: gene, Value: ['rop']
    Key: gene_synonym, Value: ['rom']
    Key: locus_tag, Value: ['YP_pPCP03']
    Key: note, Value: ['Best Blastp hit =gi|16082682|ref|NP_395229.1| (NC_003132) putative replication regulatory protein [Yersinia pestis], gi|5763813|emb|CAB531 66.1| (AL109969) putative replication regulatory protein [Yersinia pestis]; similar to gb|AAK91579.1| (AY048853), RNAI modulator protein Rom [Salmonella choleraesuis], Contains Regulatory protein Rop (Interpro|IPR000769)']
    Key: product, Value: ['putative replication regulatory protein']
    Key: protein_id, Value: ['NP_995569.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MNKQQQTALNMARFIRSQSLILLEKLDALDADEQAAMCERLHELAEELQNSIQARFEAESETGT']

type: CDS
location: [3485:3857](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478715', 'GeneID:2767720']
    Key: locus_tag, Value: ['YP_pPCP04']
    Key: note, Value: ['Best Blastp hit = gi|321919|pir||JQ1541 hypothetical 16.9K protein - Salmonella typhi murium plasmid NTP16.']
    Key: product, Value: ['hypothetical protein']
    Key: protein_id, Value: ['NP_995570.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MSKKRRPQKRPRRRRFFHRLRPPDEHHKNRRSSQRWRNPTGLKDTRRFPPEAPSCALLFRPCRLPDTSPPFSLREAWRFLIAHAVGISVRCRSFAPSWAVCTNPPFSPTTAPYPVTIVLSPTR']

type: CDS
location: [4342:4780](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478716', 'GeneID:2767712']
    Key: gene, Value: ['pim']
    Key: locus_tag, Value: ['YP_pPCP05']
    Key: note, Value: ['similar to many previously sequenced pesticin immunity protein entries of Yersinia pestis plasmid pPCP, e.g. gi| 16082683|,ref|NP_395230.1| (NC_003132) , gi|1200166|emb|CAA90861.1| (Z54145 ) , gi|1488655| emb|CAA63439.1| (X92856) , gi|2996219|gb|AAC62543.1| (AF053945) , and gi|5763814|emb|CAB531 67.1| (AL109969)']
    Key: product, Value: ['pesticin immunity protein']
    Key: protein_id, Value: ['NP_995571.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MGGGMISKLFCLALIFLSSSGLAEKNTYTAKDILQNLELNTFGNSLSHGIYGKQTTFKQTEFTNIKSNTKKHIALINKDNSWMISLKILGIKRDEYTVCFEDFSLIRPPTYVAIHPLLIKKVKSGNFIVVKEIKKSIPGCTVYYH']

type: CDS
location: [4814:5888](-)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478717', 'GeneID:2767721']
    Key: gene, Value: ['pst']
    Key: locus_tag, Value: ['YP_pPCP06']
    Key: note, Value: ['Best Blastp hit =|16082684|ref|NP_395231.1| (NC_003132) pesticin [Yersinia pestis], gi|984824|gb|AAA75369.1| (U31974) pesticin [Yersinia pestis], gi|1488654|emb|CAA63438.1| (X92856) pesticin [Yersinia pestis], gi|2996220|gb|AAC62544.1| (AF053945) pesticin [Yersinia pestis], gi|5763815|emb|CAB53168.1| (AL1099 69) pesticin [Yersinia pestis]']
    Key: product, Value: ['pesticin']
    Key: protein_id, Value: ['NP_995572.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MSDTMVVNGSGGVPAFLFSGSTLSSYRPNFEANSITIALPHYVDLPGRSNFKLMYIMGFPIDTEMEKDSEYSNKIRQESKISKTEGTVSYEQKITVETGQEKDGVKVYRVMVLEGTIAESIEHLDKKENEDILNNNRNRIVLADNTVINFDNISQLKEFLRRSVNIVDHDIFSSNGFEGFNPTSHFPSNPSSDYFNSTGVTFGSGVDLGQRSKQDLLNDGVPQYIADRLDGYYMLRGKEAYDKVRTAPLTLSDNEAHLLSNIYIDKFSHKIEGLFNDANIGLRFSDLPLRTRTALVSIGYQKGFKLSRTAPTVWNKVIAKDWNGLVNAFNNIVDGMSDRRKREGALVQKDIDSGLLK']

type: CDS
location: [6004:6421](+)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478718', 'GeneID:2767719']
    Key: locus_tag, Value: ['YP_pPCP07']
    Key: note, Value: ['Best Blastp hit = gi|16082685|ref|NP_395232.1| (NC_003132) hypothetical protein [Yersinia pestis], gi|5763816|emb|CAB53169.1| (AL109969) hypothetical protein [Yersinia pestis]']
    Key: product, Value: ['hypothetical protein']
    Key: protein_id, Value: ['NP_995573.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MKFHFCDLNHSYKNQEGKIRSRKTAPGNIRKKQKGDNVSKTKSGRHRLSKTDKRLLAALVVAGYEERTARDLIQKHVYTLTQADLRHLVSEISNGVGQSQAYDAIYQARRIRLARKYLSGKKPEGVEPREGQEREDLP']

type: CDS
location: [6663:7602](+)
qualifiers:
    Key: EC_number, Value: ['3.4.23.48']
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478719', 'GeneID:2767715']
    Key: gene, Value: ['pla']
    Key: locus_tag, Value: ['YP_pPCP08']
    Key: note, Value: ['outer membrane protease; involved in virulence in many organisms; OmpT; IcsP; SopA; Pla; PgtE; omptin; in Escherichia coli OmpT can degrade antimicrobial peptides; in Yersinia Pla activates plasminogen during infection; in Shigella flexneria SopA cleaves the autotransporter IcsA']
    Key: product, Value: ['outer membrane protease']
    Key: protein_id, Value: ['NP_995574.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MKKSSIVATIITILSGSANAASSQLIPNISPDSFTVAASTGMLSGKSHEMLYDAETGRKISQLDWKIKNVAILKGDISWDPYSFLTLNARGWTSLASGSGNMDDYDWMNENQSEWTDHSSHPATNVNHANEYDLNVKGWLLQDENYKAGITAGYQETRFSWTATGGSYSYNNGAYTGNFPKGVRVIGYNQRFSMPYIGLAGQYRINDFELNALFKFSDWVRAHDNDEHYMRDLTFREKTSGSRYYGTVINAGYYVTPNAKVFAEFTYSKYDEGKGGTQIIDKNSGDSVSIGGDAAGISNKNYTVTAGLQYRF']

type: CDS
location: [7788:8088](-)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478720', 'GeneID:2767713']
    Key: locus_tag, Value: ['YP_pPCP09']
    Key: note, Value: ['Best Blastp hit = gi|16082687|ref|NP_395234.1| (NC_003132) putative transcriptional regulator [Yersinia pestis], gi|5763818|emb|CAB53171.1| (AL109969) putative transcriptional regulator [Yersinia pestis].']
    Key: product, Value: ['putative transcriptional regulator']
    Key: protein_id, Value: ['NP_995575.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MRTLDEVIASRSPESQTRIKEMADEMILEVGLQMMREELQLSQKQVAEAMGISQPAVTKLEQRGNDLKLATLKRYVEAMGGKLSLDVELPTGRRVAFHV']

type: CDS
location: [8087:8360](-)
qualifiers:
    Key: codon_start, Value: ['1']
    Key: db_xref, Value: ['GI:45478721', 'GeneID:2767714']
    Key: locus_tag, Value: ['YP_pPCP10']
    Key: note, Value: ['Best Blastp hit = gi|16082688|ref|NP_395235.1| (NC_003132) hypothetical protein [ Yersinia pestis], gi|5763819|emb|CAB53172.1| (AL109969) hypothetical protein [Yersinia pestis]']
    Key: product, Value: ['hypothetical protein']
    Key: protein_id, Value: ['NP_995576.1']
    Key: transl_table, Value: ['11']
    Key: translation, Value: ['MADLKKLQVYGPELPRPYADTVKGSRYKNMKELRVQFSGRPIRAFYAFDPIRRAIVLCAGDKSNDKRFYEKLVRIAEDEFTAHLNTLESK']