user17717499
user17717499

Reputation: 151

Filling a column with the amount of duplicated values in another column

I have a df like this:

month outcome mom.ret
10/20 winner 0.2
10/20 winner 0.9
11/20 winner 0.6
11/20 winner 0.2
11/20 winner 0.9
10/20 loser 0.6
10/20 loser 0.2
10/20 loser 0.9
11/20 loser 0.6

I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:

month outcome mom.ret q
10/20 winner 0.2 1/2
10/20 winner 0.9 1/2
11/20 winner 0.6 1/3
11/20 winner 0.2 1/3
11/20 winner 0.9 1/3
10/20 loser 0.6 1/3
10/20 loser 0.2 1/3
10/20 loser 0.9 1/3
11/20 loser 0.6 1/1

I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?

Upvotes: 2

Views: 616

Answers (3)

jeffhale
jeffhale

Reputation: 4042

Updated answer:

@timgeb's required a groupby of just the month for me. To output a fraction instead of a decimal, I used the handy humanize library.

import humanize      # pip install humanize # if needed

df['q'] = 1 / df.groupby(['month', 'outcome'])['month'].transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))

Note that you can't just use .count() with the groupby - you need the transform method to return a Series of the same length as the original DataFrame.

Using Python 3.9.7, pandas 1.4.1

Code to make the original df (I left out the mom.ret column that wasn't germane).

import pandas as pd

df = pd.DataFrame(
    {
        "month": [
            "10/20",
            "10/20",
            "11/20",
            "11/20",
            "11/20",
            "10/20",
            "10/20",
            "10/20",
            "11/20",
        ],
        "outcome": [
            "winner",
            "winner",
            "winner",
            "winner",
            "winner",
            "loser",
            "loser",
            "loser",
            "loser",
        ],
    }
)

Upvotes: 0

timgeb
timgeb

Reputation: 78780

Use df['q'] = 1/df.groupby(['month', 'outcome']).transform('count').

Upvotes: 1

sunnytown
sunnytown

Reputation: 1996

You can use this code to achieve what you want, assuming your original DataFrame is called df:

counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)

The result looks like this:

month   outcome mom.ret q
0   10/20   winner  0.2 1/2
1   10/20   winner  0.9 1/2
2   11/20   winner  0.6 1/3
3   11/20   winner  0.2 1/3
4   11/20   winner  0.9 1/3
5   10/20   loser   0.6 1/2
6   10/20   loser   0.2 1/2
7   11/20   loser   0.9 1/2
8   11/20   loser   0.6 1/2

Upvotes: 1

Related Questions