Reputation: 8314
Consider the following data.
Species,Gene,ExonCount
Amel,g1,3
Amel,g2,1
Amel,g3,5
Sinv,g4,1
Sinv,g5,1
Sinv,g6,2
Sinv,g7,2
I would like to determine the number of entries with exon count = 1, grouped by species. This is what I've come up with so far.
import io
import pandas
instream = io.StringIO("""Species,Gene,ExonCount
Amel,g1,3
Amel,g2,1
Amel,g3,5
Sinv,g4,1
Sinv,g5,1
Sinv,g6,2
Sinv,g7,2
""")
data = pandas.read_csv(instream)
for spec in data['Species'].unique():
ones = sum([1 for x in data.loc[(data.Species == spec)]['ExonCount'] if x == 1])
print(spec, ones)
It seems to work correctly, but is not elegant and I'm guessing it's not efficient on large dataframes. Is there a better / cleaner / more Pythonic way to do this?
Upvotes: 2
Views: 36