ProcolHarum
ProcolHarum

Reputation: 741

how to group by and transform the count back to the dataframe

Say I have the following df:

years = []
months = []
ys = [2003,2003,2004,2005]

for y in ys:
    for i in range(1,4):
        years.append(y)
        months.append(i)

df= pd.DataFrame({"year":years,'month':months})
df

    year    month
0   2003    1
1   2003    2
2   2003    3
3   2003    1
4   2003    2
5   2003    3
6   2004    1
7   2004    2
8   2004    3
9   2005    1
10  2005    2
11  2005    3

I wish to group by year and month and transform back to the df so the result would be:

    year    month   count
0   2003    1        1
1   2003    2        2
2   2003    3        3
3   2003    1        1
4   2003    2        2
5   2003    3        3
6   2004    1        4
7   2004    2        5
8   2004    3        6
9   2005    1        7
10  2005    2        8 
11  2005    3        9

I tired df['count'] = df.groupby(['year','month']).transform('count') but I get 'Wrong number of items passed 0, placement implies 1'

Upvotes: 3

Views: 130

Answers (2)

anky
anky

Reputation: 75080

Adding another method with zip and pd.factorize which will return a unique number in sequence for each group:

df["count"] = pd.factorize([*zip(df['year'],df['month'])])[0]+1

Or using df.records with factorize:

cols = ['year','month']
df["count"] = pd.factorize(df[cols].to_records(index=False))[0]+1

print(df)

    year  month  count
0   2003      1      1
1   2003      2      2
2   2003      3      3
3   2003      1      1
4   2003      2      2
5   2003      3      3
6   2004      1      4
7   2004      2      5
8   2004      3      6
9   2005      1      7
10  2005      2      8
11  2005      3      9

Upvotes: 3

Andrej Kesely
Andrej Kesely

Reputation: 195458

Use .ngroup():

df["count"] = df.groupby(["year", "month"]).ngroup() + 1
print(df)

Prints:

    year  month  count
0   2003      1      1
1   2003      2      2
2   2003      3      3
3   2003      1      1
4   2003      2      2
5   2003      3      3
6   2004      1      4
7   2004      2      5
8   2004      3      6
9   2005      1      7
10  2005      2      8
11  2005      3      9

Upvotes: 4

Related Questions