Reputation: 2508
If I have a dataset, like this:
import pandas as pd
df_fruit = pd.DataFrame({
"fruit1": ['apple pie','orange','banana pie','mango','grape juice','strawberry milkshake'],
"fruit2": ['apple','orange juice','banana','mango pie','grapes','strawberry'],
"fruit3": ['apple juice','orange pie','banana juice','mango','grapes','strawberry pie']
})
and now I'd like to know what the difference in pie counts is in each column (ie. by how many pies is one of the column richer), I can do this:
df_fruit['fruit1'].str.contains('pie').sum()-df_fruit['fruit2'].str.contains('pie').sum()
# 1
That is nice but readability counts (eg. imagine checking >3 columns). Is there a way how to find the relevant rows and print a) same as I did just without the need to access the columns separately, b) the pairwise difference in count of cells with a given string for given columns (eg. fruit1-fruit2 = 1
, fruit1-fruit3=
...)?
Thanks.
Upvotes: 0
Views: 39
Reputation: 15872
You can calculate pie
count per column, and then use pandas.DataFrame.eval
to do different operations:
>>> pie_count = df_fruit.apply(lambda x: x.str.contains('pie')).sum().to_frame().T
>>> pie_count
fruit1 fruit2 fruit3
0 2 1 2
>>> pie_count.eval('fruit1 - fruit2').item()
1
>>> pie_count.eval('fruit1 - fruit3').item()
1
If you are in python 3.8+:
>>> from itertools import permutations, combinations
>>> {(expr:= f"{c1} - {c2}"):
pie_count.eval(expr).item() for c1, c2 in permutations(pie_count.columns, r=2)}
{'fruit1 - fruit2': 1,
'fruit1 - fruit3': 0,
'fruit2 - fruit1': -1,
'fruit2 - fruit3': -1,
'fruit3 - fruit1': 0,
'fruit3 - fruit2': 1}
>>> {(expr:= f"{c1} - {c2}"):
pie_count.eval(expr).item() for c1, c2 in combinations(pie_count.columns, r=2)}
{'fruit1 - fruit2': 1, 'fruit1 - fruit3': 0, 'fruit2 - fruit3': -1}
To just get fruit1-fruit2
and fruit2-fruit3
:
>>> pie_count = df_fruit.apply(lambda x: x.str.contains('pie')).sum()
>>> pie_count['fruit1'] - pie_count.drop('fruit1')
# Or
>>> pie_count.drop('fruit1').rsub(pie_count['fruit1'])
fruit2 1
fruit3 0
dtype: int64
Upvotes: 1