Reputation: 139
I've written below code to create a dataframe and for grouping as per dict.
import pandas as pd
df = pd.DataFrame([["1598004168", "5", "10", "500"],["1598004168", "4", "8", "300"], ["1598004168", "3", "7", "600"], ["1598004168", "8", "6", "600"], ["1598004169", "2", "4", "100"], ["1598004169", "3", "2", "900"], ["1598004169", "3", "5", "300"], ["1598004170", "1", "8", "200"], ["1598004170", "4", "1", "400"], ["1598004170", "7", "3", "700"]], columns=["ts", "o", "c", "v"])
df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v':'last'})
print(df)
The above code is working fine. Now my question is to get the 2nd item from every group for column 'v' and the expected output should be as mentioned below.
Expected output:
ts o c v
0 1598004168 5 6 600
1 1598004169 2 5 300
2 1598004170 1 3 400
Please help to share the appropriate code for same as below code is not working.
df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v':'nth(2)'})
Upvotes: 2
Views: 478
Reputation: 18068
As the other answer stated, you can use a lambda function, however, you will have to check series length to make sure you don't go out of bounds (IndexError).
For example:
df.groupby('ts').agg(
o=('o', 'first'),
c=('c', 'last'),
vn=('v', lambda x: x.iloc[n] if len(x) > n else math.nan)
)
Upvotes: 0
Reputation: 9197
You can use lambda
combined with .iloc[]
df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v': lambda x: x.iloc[2]})
Out[30]:
ts o c v
0 1598004168 5 6 600
1 1598004169 2 5 300
2 1598004170 1 3 700
Unfortunately it seems like there is no naitive way with nth()
in pandas currently:
The pandas devs have a open ticket created in 2014 which points to exactly that problem:
Upvotes: 1