💻 Б1: Парсирање на фајлови

Опис на барање

Да се реализираат примери за парсирање на FASTA и GenBank формати, онака како што е објаснето во поглавје 2.4 од туторијалот.

FASTA datasets:

  1. Setaria italica strain Yugu1 chromosome VIII, whole genome shotgun sequence
    ENA: CM003535 [1]

  2. Variola major virus (strain Bangladesh-1975) complete genome
    GenBank: L22579.1 [2]

  3. Bovine papular stomatitis virus, complete genome
    NCBI Reference Sequence: NC_005337.1 [3]

  4. Homo sapiens hepatitis A virus cellular receptor 1 (HAVCR1), transcript variant 1, mRNA
    NCBI Reference Sequence: NM_012206.3 [4]

Импортирање на главниот модул:

from Bio import SeqIO

1: Setaria italica strain Yugu1 chromosome VIII, whole genome shotgun sequence.

FASTA фајл:

for seq_record in SeqIO.parse("fasta-examples/CM003535.1.fasta", "fasta"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
ENA|CM003535|CM003535.1
Seq('ACAGTCGTCGACACAGGGCGATTCTATAAAACGGGTCTGGAGGCCATTTTCACG...AGG')
40689132

2: Variola major virus (strain Bangladesh-1975) complete genome

FASTA фајл:

for seq_record in SeqIO.parse("fasta-examples/variola-major-virus-1.fasta", "fasta"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
L22579.1
Seq('TAGTTAGATAAATTAATAATACATAAGTTTTAATACATTAATATTATATTATAC...CTT')
186103

Genbank фајл:

for seq_record in SeqIO.parse("fasta-examples/variola-major-virus-2.gb", "genbank"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
L22579.1
Seq('TAGTTAGATAAATTAATAATACATAAGTTTTAATACATTAATATTATATTATAC...CTT')
186103

3: Bovine papular stomatitis virus, complete genome

FASTA фајл:

for seq_record in SeqIO.parse("fasta-examples/bovine-papular-stomatitis-virus-1.fasta", "fasta"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
NC_005337.1
Seq('GAACTCGGGAGGCGGTCGTGCGGACGCACGGACGCACGGACGCACGGACGGGCT...TTC')
134431

Genbank фајл:

for seq_record in SeqIO.parse("fasta-examples/bovine-papular-stomatitis-virus-2.gb", "genbank"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
NC_005337.1
Seq('GAACTCGGGAGGCGGTCGTGCGGACGCACGGACGCACGGACGCACGGACGGGCT...TTC')
134431

4: Homo sapiens hepatitis A virus cellular receptor 1 (HAVCR1), transcript variant 1, mRNA

FASTA фајл:

for seq_record in SeqIO.parse("fasta-examples/HAVCR1-1.fasta", "fasta"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
NM_012206.3
Seq('GACCAGGAGTCAGTTTGGCGGTTATGTGTGGGGAAGAAGCTGGGAAGTCAGGGG...GAA')
1713

Genbank фајл:

for seq_record in SeqIO.parse("fasta-examples/HAVCR1-2.gb", "genbank"):
    print(seq_record.id)
    print(repr(seq_record.seq))
    print(len(seq_record))
NM_012206.3
Seq('GACCAGGAGTCAGTTTGGCGGTTATGTGTGGGGAAGAAGCTGGGAAGTCAGGGG...GAA')
1713