Pandas count values in one column where another column stays the same

Question

I'm trying to get a count of all values in one column where another column stays the same.. having a real brain fart here. Below is an example since I know the wording is kind of weird to read.

Data:

a | item1
a | item2
a | item3
a | item4
b | item1
b | item3
c | item4

What I am trying to achieve:

a | item1 | 4
a | item2 | 4 
a | item3 | 4
a | item4 | 4
b | item1 | 2
b | item3 | 2
c | item4 | 1

I've tried a few different ways of doing this but keep ending up with some really weird count numbers. Also, my end goal is to have this data all consolidated to an excel sheet. I currently have a fairly large sheet with a bunch of columns, but I can't figure out how to get this count.

That being said, since it is already in excel and the end goal is excel, I would be fine with a solution there too. Just figured Pandas would likely be faster and easier than writing an excel function of some kind.

Celius Stingher · Accepted Answer

You can use groupby() with transform:

df['col_3'] = df.groupby('col_1')['col_2'].transform('count')

Output:

  col_1 col_2   col_3
0     a     1       4
1     a     2       4
2     a     3       4
3     a     4       4
4     b     1       2
5     b     3       2
6     c     4       1

Example data:

df = pd.DataFrame({'col_1':['a','a','a','a','b','b','c'],
                   'col_2':[1,2,3,4,1,3,4]})

Pandas count values in one column where another column stays the same

Answers (1)

Related Questions