💻 Б1: Парсирање на фајлови¶
Опис на барање
Да се реализираат примери за парсирање на FASTA и GenBank формати, онака како што е објаснето во поглавје 2.4 од туторијалот.
FASTA datasets:
Setaria italica strain Yugu1 chromosome VIII, whole genome shotgun sequence
ENA: CM003535 [1]Variola major virus (strain Bangladesh-1975) complete genome
GenBank: L22579.1 [2]Bovine papular stomatitis virus, complete genome
NCBI Reference Sequence: NC_005337.1 [3]Homo sapiens hepatitis A virus cellular receptor 1 (HAVCR1), transcript variant 1, mRNA
NCBI Reference Sequence: NM_012206.3 [4]
Импортирање на главниот модул:
from Bio import SeqIO
1: Setaria italica strain Yugu1 chromosome VIII, whole genome shotgun sequence.¶
FASTA фајл:
for seq_record in SeqIO.parse("fasta-examples/CM003535.1.fasta", "fasta"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
ENA|CM003535|CM003535.1
Seq('ACAGTCGTCGACACAGGGCGATTCTATAAAACGGGTCTGGAGGCCATTTTCACG...AGG')
40689132
2: Variola major virus (strain Bangladesh-1975) complete genome¶
FASTA фајл:
for seq_record in SeqIO.parse("fasta-examples/variola-major-virus-1.fasta", "fasta"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
L22579.1
Seq('TAGTTAGATAAATTAATAATACATAAGTTTTAATACATTAATATTATATTATAC...CTT')
186103
Genbank фајл:
for seq_record in SeqIO.parse("fasta-examples/variola-major-virus-2.gb", "genbank"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
L22579.1
Seq('TAGTTAGATAAATTAATAATACATAAGTTTTAATACATTAATATTATATTATAC...CTT')
186103
3: Bovine papular stomatitis virus, complete genome¶
FASTA фајл:
for seq_record in SeqIO.parse("fasta-examples/bovine-papular-stomatitis-virus-1.fasta", "fasta"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
NC_005337.1
Seq('GAACTCGGGAGGCGGTCGTGCGGACGCACGGACGCACGGACGCACGGACGGGCT...TTC')
134431
Genbank фајл:
for seq_record in SeqIO.parse("fasta-examples/bovine-papular-stomatitis-virus-2.gb", "genbank"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
NC_005337.1
Seq('GAACTCGGGAGGCGGTCGTGCGGACGCACGGACGCACGGACGCACGGACGGGCT...TTC')
134431
4: Homo sapiens hepatitis A virus cellular receptor 1 (HAVCR1), transcript variant 1, mRNA¶
FASTA фајл:
for seq_record in SeqIO.parse("fasta-examples/HAVCR1-1.fasta", "fasta"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
NM_012206.3
Seq('GACCAGGAGTCAGTTTGGCGGTTATGTGTGGGGAAGAAGCTGGGAAGTCAGGGG...GAA')
1713
Genbank фајл:
for seq_record in SeqIO.parse("fasta-examples/HAVCR1-2.gb", "genbank"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
NM_012206.3
Seq('GACCAGGAGTCAGTTTGGCGGTTATGTGTGGGGAAGAAGCTGGGAAGTCAGGGG...GAA')
1713