ccs
ccs

Reputation: 57

translate DNA sequences to protein sequences within a pandas dataframe

I have a pandas dataframe that contains DNA sequences and gene names. I want to translate the DNA sequences into protein sequences, and store the protein sequences in a new column.

The data frame looks like:

DNA gene_name
ATGGATAAG gene_1
ATGCAGGAT gene_2

After translating and storing the DNA, the dataframe would look like:

DNA gene_name protein
ATGGATAAG... gene_1 MDK...
ATGCAGGAT... gene_2 MQD...

I am aware of biopython's (https://biopython.org/wiki/Seq) ability to translate DNA to protein, for example:

>>> from Bio.Seq import Seq
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('MAIVMGR*KGAR*')

However, I am not sure how to implement this in the context of a dataframe. Any help would be much appreciated!

Upvotes: 1

Views: 1036

Answers (2)

Cody Smith
Cody Smith

Reputation: 36

I would suggest using pandas.DataFrame.apply.

Something like:

df['protein'] = df['DNA'].apply(lambda x: Seq(x).translate(), axis=1)

Upvotes: 2

user7864386
user7864386

Reputation:

Since you want to translate each sequence in the "DNA" column, you could use a list comprehension:

df['protein'] = [''.join(Seq(sq).translate()) for sq in df['DNA']]

Output:

         DNA gene_name protein
0  ATGGATAAG    gene_1     MDK
1  ATGCAGGAT    gene_2     MQD

Upvotes: 1

Related Questions