leilei
leilei

Reputation: 71

How to count accumulative unique values by groups in python?

I wonder how to count accumulative unique values by groups in python?

Below is the dataframe example:

Group Year Type
A 1998 red
A 1998 blue
A 2002 red
A 2005 blue
A 2008 blue
A 2008 yello
B 1998 red
B 2001 red
B 2003 red
C 1996 red
C 2002 orange
C 2002 red
C 2012 blue
C 2012 yello

I need to create a new column by Column "Group". The value of this new column should be the accumulative unique values of Column "Type", accumulating by Column "Year".

Below is the dataframe I want. For example: (1)For Group A and in year 1998, I want to count the unique value of Type in year 1998, and there are two unique values of Type: red and blue. (2)For Group A and in year 2002, I want to count the unique value of Type in year 1998 and 2002, and there are also two unique values of Type: red and blue. (3)For Group A and in year 2008, I want to count the unique value of Type in year 1998, 2002, 2005, and 2008, and there are also four unique values of Type: red, blue, and yellow.

Group Year Type Want
A 1998 red 2
A 1998 blue 2
A 2002 red 2
A 2005 blue 2
A 2008 blue 3
A 2008 yello 3
B 1998 red 1
B 2001 red 1
B 2003 red 1
C 1996 red 1
C 2002 orange 2
C 2002 red 2
C 2012 blue 4
C 2012 yello 4

One more thing about this dataframe: not all groups have values in the same years. For example, group A has two values in year 1998 and 2008, one value in year 2002 and 2005. Group B has values in year 1998, 2001, and 2003.

I wonder how to address this problem. Your great help means a lot to me. Thanks!

Upvotes: 1

Views: 67

Answers (1)

jezrael
jezrael

Reputation: 862641

Use custom lambda function with factorize in GroupBy.transform:

f = lambda x: pd.factorize(x)[0]
df['Want1'] = df.groupby('Group', sort=False)['Type'].transform(f) + 1
print (df)
   Group  Year    Type  Want1
0      A  1998     red      1
1      A  2002     red      1
2      A  2005    blue      2
3      A  2008    blue      2
4      A  2009   yello      3
5      B  1998     red      1
6      B  2001     red      1
7      B  2003     red      1
8      C  1996     red      1
9      C  2002  orange      2
10     C  2008    blue      3
11     C  2012   yello      4

Upvotes: 1

Related Questions