Reputation: 323
I have the following df1:
id period color size rate
1 01 red 12 30
1 02 red 12 30
2 01 blue 12 35
3 03 blue 12 35
4 01 blue 12 35
4 02 blue 12 35
5 01 pink 10 40
6 01 pink 10 40
I need to create an index column that is basically a group of color-size-rate and count the number of unique id(s) that have this combination. My output df after transformation should look like this df1:
id period color size rate index count
1 01 red 12 30 red-12-30 1
1 02 red 12 30 red-12-30 1
2 01 blue 12 35 blue-12-35 3
2 03 blue 12 35 blue-12-35 3
4 01 blue 12 35 blue-12-35 3
4 02 blue 12 35 blue-12-35 3
5 01 pink 10 40 pink-10-40 2
6 01 pink 10 40 pink-10-40 2
I am able to get the count but it is not counting "unique" ids but instead , just the number of occurrences.
1 01 red 12 30 red-12-30 2
1 02 red 12 30 red-12-30 2
2 01 blue 12 35 blue-12-35 4
2 03 blue 12 35 blue-12-35 4
4 01 blue 12 35 blue-12-35 4
4 02 blue 12 35 blue-12-35 4
This is wrong as it is not actually grouping by id to count unique ones.
Appreciate any pointers in this direction.
Adding an edit here as my requirement changed: The count need to also be grouped by 'period' ie my final df should be:
index period count
red-12-30 01 1
red-12-30 02 1
blue-12-35 01 2
blue-12-35 03 1
blue-12-35 02 1
pink-10-40 01 2
Solution: from @anky: When I try adding another groupby['period'], I am getting dimension mismatch error.
Thank you in advance.
Upvotes: 1
Views: 59
Reputation: 75080
You can try aggregating a join
for creating the index column, then group on the same and get nunique
using groupby+transform
idx = df[['color','size','rate']].astype(str).agg('-'.join,1)
out = df.assign(index=idx,count=df.groupby(idx)['id'].transform('nunique'))
print(out)
id period color size rate index count
0 1 1 red 12 30 red-12-30 1
1 1 2 red 12 30 red-12-30 1
2 2 1 blue 12 35 blue-12-35 3
3 3 3 blue 12 35 blue-12-35 3
4 4 1 blue 12 35 blue-12-35 3
5 4 2 blue 12 35 blue-12-35 3
6 5 1 pink 10 40 pink-10-40 2
7 6 1 pink 10 40 pink-10-40 2
Upvotes: 2