BARUN
BARUN

Reputation: 139

How to select nth item after groupby using dict

I've written below code to create a dataframe and for grouping as per dict.

import pandas as pd

df = pd.DataFrame([["1598004168", "5", "10", "500"],["1598004168", "4", "8", "300"], ["1598004168", "3", "7", "600"], ["1598004168", "8", "6", "600"], ["1598004169", "2", "4", "100"], ["1598004169", "3", "2", "900"], ["1598004169", "3", "5", "300"], ["1598004170", "1", "8", "200"], ["1598004170", "4", "1", "400"], ["1598004170", "7", "3", "700"]], columns=["ts", "o", "c", "v"])
df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v':'last'})
print(df)

The above code is working fine. Now my question is to get the 2nd item from every group for column 'v' and the expected output should be as mentioned below.

Expected output:

           ts  o  c    v
0  1598004168  5  6  600
1  1598004169  2  5  300
2  1598004170  1  3  400

Please help to share the appropriate code for same as below code is not working.

df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v':'nth(2)'})

Upvotes: 2

Views: 478

Answers (2)

Danny Varod
Danny Varod

Reputation: 18068

As the other answer stated, you can use a lambda function, however, you will have to check series length to make sure you don't go out of bounds (IndexError).

For example:

df.groupby('ts').agg(
    o=('o', 'first'),
    c=('c', 'last'),
    vn=('v', lambda x: x.iloc[n] if len(x) > n else math.nan)
)

Upvotes: 0

Andreas
Andreas

Reputation: 9197

You can use lambda combined with .iloc[]

df = df.groupby(["ts"], as_index=False).agg({'o':'first', 'c':'last', 'v': lambda x: x.iloc[2]})

Out[30]: 
           ts  o  c    v
0  1598004168  5  6  600
1  1598004169  2  5  300
2  1598004170  1  3  700

Unfortunately it seems like there is no naitive way with nth() in pandas currently: The pandas devs have a open ticket created in 2014 which points to exactly that problem:

Upvotes: 1

Related Questions