Ssank
Ssank

Reputation: 3657

finding GC content of sequences in pandas dataframe

I have a dataframe df

oligo_name  oligo_sequence

AAAAA       attttggggctggtaa

BBBBB       attttcccgaatgtca

and so on. To calculate the GC content of each sequence I did the following

from Bio.SeqUtils import GC

df['GC content'] = GC(df['oligo_sequence'])

but I get the following error :

KeyError: 'Level G must be same as name (None)'

Can you suggest a fix or a better way to calculate GC content of sequneces in a pandas data frame. Thanks

Upvotes: 1

Views: 666

Answers (1)

EdChum
EdChum

Reputation: 394031

The following worked for me:

In [23]:

df['GC content'] = df['oligo_sequence'].apply(GC)
df
Out[23]:
  oligo_name    oligo_sequence  GC content
0      AAAAA  attttggggctggtaa       43.75
1      BBBBB  attttcccgaatgtca       37.50

You can't pass a Series as a param to a function unless it understands what a pandas Series or the array type is so you can instead call apply and pass the function as the param which will call that function for every value in the Series as shown above.

Upvotes: 1

Related Questions