Reputation: 165
I have the following dataframe:
userid | month |
---|---|
user1 | jan |
user2 | jan |
user3 | jan |
user1 | feb |
user3 | feb |
user1 | march |
if user appears more than 2 months, I will group them as active, else no active. The desired output is:
userid | month | active |
---|---|---|
user1 | jan,feb,march | true |
user2 | jan | false |
user3 | jan,feb | false |
how can i do it with pandas? pardon me if i do not have a starting code, as i am totally unsure. dont mind helping a newbie here.
Upvotes: 0
Views: 13
Reputation: 863031
Use GroupBy.agg
with join
and lambda function:
df = df.groupby('userid').agg(month = ('month', ','.join),
active=('month', lambda x: len(x) > 2))
print (df)
month active
userid
user1 jan,feb,march True
user2 jan False
user3 jan,feb False
Or count groups and reassign boolean:
df = (df.groupby('userid').agg(month = ('month', ','.join), active=('month','size'))
.assign(active = lambda x: x['active'].gt(2)))
Upvotes: 1