Reputation: 7206
I've tried another solution with this error on SO they all are related to Python input or raw_input and didn't solve my problem.
txt = '''series NAME VAL1 VAL2
0 AAA 27 678
1 BBB 45 744
2 CCC 34 275
3 AAA 29 932
4 CCC 47 288
5 BBB 24 971
'''
df = pd.read_table(StringIO(txt),sep = '\s+')
del df['series']
df = df.groupby('NAME').apply(lambda x: x.max()-x.min())
TypeError: unsupported operand type(s) for -: 'str' and 'str'
But if I check individually (max, min) they work. I've checked the type of columns VAL1
and VAL2
and they are of int64
type
Upvotes: 1
Views: 4974
Reputation: 402333
This is a bug up until v0.22. From v0.23 onwards, non-numeric columns are ignored by default.
Unfortunately, groupby.apply
will attempt to run your lambda on every column, including the column you've grouped on ("NAME", which is a string).
You can confirm by checking the difference between
df.groupby('NAME')[['VAL1', 'VAL2']].apply(lambda x: x.max() - x.min())
VAL1 VAL2
NAME
AAA 2 254
BBB 21 227
CCC 13 13
Versus
df.groupby('NAME')['NAME'].apply(lambda x: x.max() - x.min())
---------------------------------------------------------------------------
TypeError
Basically, explicit is better than implicit.
Alternatively, select all numeric columns and pass a Series as the grouper (note that this is slower than grouping on a column that belongs to the DataFrame), but this means you don't have to list out each column individually.
df.select_dtypes('number').groupby(df.NAME).apply(lambda x: x.max() - x.min())
VAL1 VAL2
NAME
AAA 2 254
BBB 21 227
CCC 13 13
Thanks to @JC.
Upvotes: 2