Reputation: 741
Say I have the following df:
years = []
months = []
ys = [2003,2003,2004,2005]
for y in ys:
for i in range(1,4):
years.append(y)
months.append(i)
df= pd.DataFrame({"year":years,'month':months})
df
year month
0 2003 1
1 2003 2
2 2003 3
3 2003 1
4 2003 2
5 2003 3
6 2004 1
7 2004 2
8 2004 3
9 2005 1
10 2005 2
11 2005 3
I wish to group by year and month and transform back to the df so the result would be:
year month count
0 2003 1 1
1 2003 2 2
2 2003 3 3
3 2003 1 1
4 2003 2 2
5 2003 3 3
6 2004 1 4
7 2004 2 5
8 2004 3 6
9 2005 1 7
10 2005 2 8
11 2005 3 9
I tired df['count'] = df.groupby(['year','month']).transform('count')
but I get 'Wrong number of items passed 0, placement implies 1'
Upvotes: 3
Views: 130
Reputation: 75080
Adding another method with zip
and pd.factorize
which will return a unique number in sequence for each group:
df["count"] = pd.factorize([*zip(df['year'],df['month'])])[0]+1
Or using df.records
with factorize
:
cols = ['year','month']
df["count"] = pd.factorize(df[cols].to_records(index=False))[0]+1
print(df)
year month count
0 2003 1 1
1 2003 2 2
2 2003 3 3
3 2003 1 1
4 2003 2 2
5 2003 3 3
6 2004 1 4
7 2004 2 5
8 2004 3 6
9 2005 1 7
10 2005 2 8
11 2005 3 9
Upvotes: 3
Reputation: 195458
Use .ngroup()
:
df["count"] = df.groupby(["year", "month"]).ngroup() + 1
print(df)
Prints:
year month count
0 2003 1 1
1 2003 2 2
2 2003 3 3
3 2003 1 1
4 2003 2 2
5 2003 3 3
6 2004 1 4
7 2004 2 5
8 2004 3 6
9 2005 1 7
10 2005 2 8
11 2005 3 9
Upvotes: 4