Reputation: 3
The following code gives me a keyError: 0
. It works if I remove the [0]
part.
Note that what I really wanted to do is to take each sub-dataframe in the group, do some manipulation (that for examples involves calculation between rows and between columns), and return a new dataframe. It is similar to ddply or data.table groupby operations in R.
import pandas as pd
df = pd.DataFrame(dict(a=list('XYXYXYXY'), b=list('AABBCCDD')))
df.groupby('a').apply(lambda x: x['b'][0])
Results:
KeyError Traceback (most recent call last)
<ipython-input-136-7b87ffbc2fd2> in <module>()
1 df = pd.DataFrame(dict(a=list('XYXYXYXY'), b=list('AABBCCDD')))
----> 2 df.groupby('a').apply(lambda x: x['b'][0])
Upvotes: 0
Views: 1217
Reputation: 323236
If you do like using apply
df.groupby('a').b.apply(lambda x: x.values.tolist()[0])
Out[952]:
a
X A
Y A
Name: b, dtype: object
Or try
df.groupby('a').b.first()
Out[960]:
a
X A
Y A
Name: b, dtype: object
Upvotes: 0
Reputation: 40878
You're getting a key error because of the [0]
. While it's not a perfect description, when you specify
df.groupby('a')
you're creating something like an iterator of (label, DataFrame) pairs for each grouping, and it's not til you call apply
that some function gets applied to each of those "sub-frames". For example,
for grp, frame in df.groupby('a'):
print('Group', grp)
print(frame)
print()
Group X
a b
0 X A
2 X B
4 X C
6 X D
Group Y
a b
1 Y A
3 Y B
5 Y C
7 Y D
Using [0]
will attempt to index by label, not integer location, and your DataFrame where a
== Y
is indexed with [1, 3, 5, 7]. In other words, you're trying to do:
df2 = df[df.a=='Y']
df2['b'][0] # Not only is this a key error, it's also chained indexing
You may find this useful: How is pandas groupby method actually working?
The working version of your code would be
df.groupby('a').apply(lambda x: x.iloc[0,1])
But you should prefer @juanpa's solution which will be faster here.
Upvotes: 2
Reputation: 95948
@BradSolomon explained the source of your error. However, I think what you really want is the following:
In [7]: df.groupby('a')['b'].nth(0)
Out[7]:
a
X A
Y A
Name: b, dtype: object
Upvotes: 1