Reputation: 3657
I have a dataframe df
oligo_name oligo_sequence
AAAAA attttggggctggtaa
BBBBB attttcccgaatgtca
and so on. To calculate the GC content of each sequence I did the following
from Bio.SeqUtils import GC
df['GC content'] = GC(df['oligo_sequence'])
but I get the following error :
KeyError: 'Level G must be same as name (None)'
Can you suggest a fix or a better way to calculate GC content of sequneces in a pandas data frame. Thanks
Upvotes: 1
Views: 666
Reputation: 394031
The following worked for me:
In [23]:
df['GC content'] = df['oligo_sequence'].apply(GC)
df
Out[23]:
oligo_name oligo_sequence GC content
0 AAAAA attttggggctggtaa 43.75
1 BBBBB attttcccgaatgtca 37.50
You can't pass a Series as a param to a function unless it understands what a pandas Series or the array type is so you can instead call apply
and pass the function as the param which will call that function for every value in the Series as shown above.
Upvotes: 1