hedebyhedge
hedebyhedge

Reputation: 455

Pandas: Get last values for each user containing certain value in other column

I have a dataframe like the following:

user        item      
A           1           
A           2          
A           2           
B           3           
B           4
B           4   

In general, how would I return a dataframe like this:

user        item      
A           2          
A           2           
B           4
B           4

Where you return for every user, the rows containing the last item value for that user?

I've tried the .last() method after grouping by user but it would only give one instance for A and B.

Upvotes: 0

Views: 476

Answers (4)

BENY
BENY

Reputation: 323316

In your case using transform

df[df.item==df.groupby('user').item.transform('last')]
Out[292]: 
  user  item
1    A     2
2    A     2
4    B     4
5    B     4

Upvotes: 2

Valentino
Valentino

Reputation: 7361

The more general solution I can think:

df['dupl'] = df.duplicated()

def lastdup(x):
    idx = x.loc[~x['dupl']].index[-1]
    return x.loc[idx:]

ddf = df.groupby('user').apply(lastdup)
ddf.drop('dupl', inplace=True, axis=1)

First save in a column a boolean value to mark duplicated lines. Then use groupby: in eack group look for the index of the last non duplicated row, and return from that row to the end of the group.

This solution will work for any number of repeated lines at the end. This number can be different from group to group, you do not need to know it in advance.

For example, if you use this input (note the extra A 2 row with respect to your example):

user        item      
A           1           
A           2          
A           2
A           2           
B           3           
B           4
B           4   

You'll get:

       user  item
user             
A    1    A     2
     2    A     2
     3    A     2
B    5    B     4
     6    B     4

Upvotes: 0

Andy
Andy

Reputation: 53

Set up your data:

data = [     
['A',1 ],          
['A',2 ] ,        
['A',2 ]  ,        
['B',3 ],       
['B',4],
['B', 4 ]
]
df = pd.DataFrame(data, columns=['user',  'item'])

Do the group_by with last and then merge it back into the original df:

grouped_df = pd.DataFrame(df.groupby('user').item.last()).reset_index()
pd.merge(df, grouped_df, on=['user', 'item'], how='inner') 

   user item
0   A   2
1   A   2
2   B   4
3   B   4

Upvotes: 0

Onel Harrison
Onel Harrison

Reputation: 1334

You can accomplish what you want by doing the following.

data = {
    'user': ['A', 'A', 'A', 'B', 'B', 'B'],
    'items': [1, 2, 2, 3, 4, 4]
}

df = pd.DataFrame(data)
df.groupby('user').tail(2)

Output

user    items
A       2
A       2
B       4
B       4

Upvotes: 0

Related Questions