TheDude
TheDude

Reputation: 1355

Use pandas groupby.size() results for arithmetical operation

I got the following problem which I got stuck on and unfortunately cannot resolve by myself or by similar questions that I found on stackoverflow.

To keep it simple, I'll give a short example of my problem:

I got a Dataframe with several columns and one column that indicates the ID of a user. It might happen that the same user has several entries in this data frame:

|   |  userID   |      col2      | col3  |
+---+-----------+----------------+-------+
| 1 | 1         | a              |     b |
| 2 | 1         | c              |     d |
| 3 | 2         | a              |     a |
| 4 | 3         | d              |     e |

Something like this. Now I want to known the number of rows that belongs to a certain userID. For this operation I tried to use df.groupby('userID').size() which in return I want to use for another simple calculation, like division whatsover. But as I try to save the results of the calculation in a seperate column, I keep getting NaN values.

Is there a way to solve this so that I get the result of the calculations in a seperate column?

Thanks for your help!

edit//

To make clear, how my output should look like. The upper dataframe is my main data frame so to say. Besides this frame I got a second frame looking like this:

|   |  userID   |      value     | value/appearances  |
+---+-----------+----------------+-------+
| 1 | 1         | 10             |     10 / 2 = 5     |
| 3 | 2         | 20             |     20 / 1 = 20    |
| 4 | 3         | 30             |     30 / 1 = 30    |

So I basically want in the column 'value/appearances' to have the result of the number in the value column divided by the number of appearances of this certain user in the main dataframe. For user with ID=1 this would be 10/2, as this user has a value of 10 and has 2 rows in the main dataframe. I hope this makes it a bit clearer.

Upvotes: 0

Views: 534

Answers (1)

EdChum
EdChum

Reputation: 394169

IIUC you want to do the following, groupby on 'userID' and call transform on the grouped column and pass 'size' to identify the method to call:

In [54]:
df['size'] = df.groupby('userID')['userID'].transform('size')
df

Out[54]:
   userID col2 col3  size
1       1    a    b     2
2       1    c    d     2
3       2    a    a     1
4       3    d    e     1

What you tried:

In [55]:
df.groupby('userID').size()

Out[55]:
userID
1    2
2    1
3    1
dtype: int64

When assigned back to the df aligns with the df index so it introduced NaN for the last row:

In [57]:
df['size'] = df.groupby('userID').size()
df

Out[57]:
   userID col2 col3  size
1       1    a    b     2
2       1    c    d     1
3       2    a    a     1
4       3    d    e   NaN

Upvotes: 2

Related Questions