Read text file of protein sequences in python

Question

I am trying to read DNA Sequences in Pandas Data frame but not getting the whole sequence in Data frame column.

I have tried File.open method simple read_csv method these methods didn't help me much.

pd.read_csv('../input/data 1/non-cpp.txt', index_col=0, header=None)

Output:

0
>
GNNRPVYIPQPRPPHPRI
>
HGVSGHGQHGVHG
>

myfile = open("../input/data 1/non-cpp.txt")
for line in myfile:
    print(line)
myfile.close()

>

GNNRPVYIPQPRPPHPRI

>

HGVSGHGQHGVHG

>

QRFSQPTFKLPQGRLTLSRKF

>

FLPVLAGIAAKVVPALFCKITKKC

DataSet Source

Label of Sequence
long Sequence (String)

I need labels in one column which you can see in 1st and whole sequence in the second column which you can see in second row e.g

Label

Sequence

Constanza Garcia · Accepted Answer

this is a rough not one liner but it will give you what you need, a series with the DNA sequences.

import pandas as pd

data = pd.read_csv('cpp.txt', sep=">",header=None)

data[0].dropna()

I hope it helps

Read text file of protein sequences in python

Answers (2)

Related Questions