Reputation: 455
I have a dataframe like the following:
user item
A 1
A 2
A 2
B 3
B 4
B 4
In general, how would I return a dataframe like this:
user item
A 2
A 2
B 4
B 4
Where you return for every user, the rows containing the last item value for that user?
I've tried the .last() method after grouping by user but it would only give one instance for A and B.
Upvotes: 0
Views: 476
Reputation: 323316
In your case using transform
df[df.item==df.groupby('user').item.transform('last')]
Out[292]:
user item
1 A 2
2 A 2
4 B 4
5 B 4
Upvotes: 2
Reputation: 7361
The more general solution I can think:
df['dupl'] = df.duplicated()
def lastdup(x):
idx = x.loc[~x['dupl']].index[-1]
return x.loc[idx:]
ddf = df.groupby('user').apply(lastdup)
ddf.drop('dupl', inplace=True, axis=1)
First save in a column a boolean value to mark duplicated lines. Then use groupby
: in eack group look for the index of the last non duplicated row, and return from that row to the end of the group.
This solution will work for any number of repeated lines at the end. This number can be different from group to group, you do not need to know it in advance.
For example, if you use this input (note the extra A 2
row with respect to your example):
user item
A 1
A 2
A 2
A 2
B 3
B 4
B 4
You'll get:
user item
user
A 1 A 2
2 A 2
3 A 2
B 5 B 4
6 B 4
Upvotes: 0
Reputation: 53
Set up your data:
data = [
['A',1 ],
['A',2 ] ,
['A',2 ] ,
['B',3 ],
['B',4],
['B', 4 ]
]
df = pd.DataFrame(data, columns=['user', 'item'])
Do the group_by with last and then merge it back into the original df:
grouped_df = pd.DataFrame(df.groupby('user').item.last()).reset_index()
pd.merge(df, grouped_df, on=['user', 'item'], how='inner')
user item
0 A 2
1 A 2
2 B 4
3 B 4
Upvotes: 0
Reputation: 1334
You can accomplish what you want by doing the following.
data = {
'user': ['A', 'A', 'A', 'B', 'B', 'B'],
'items': [1, 2, 2, 3, 4, 4]
}
df = pd.DataFrame(data)
df.groupby('user').tail(2)
Output
user items
A 2
A 2
B 4
B 4
Upvotes: 0