Reputation: 151
I have a df like this:
month | outcome | mom.ret |
---|---|---|
10/20 | winner | 0.2 |
10/20 | winner | 0.9 |
11/20 | winner | 0.6 |
11/20 | winner | 0.2 |
11/20 | winner | 0.9 |
10/20 | loser | 0.6 |
10/20 | loser | 0.2 |
10/20 | loser | 0.9 |
11/20 | loser | 0.6 |
I would like to add another column, which has 1 / by the counts of times the value "winner" or "loser" appears per each month on the column outcome. The expected output for the example df is:
month | outcome | mom.ret | q |
---|---|---|---|
10/20 | winner | 0.2 | 1/2 |
10/20 | winner | 0.9 | 1/2 |
11/20 | winner | 0.6 | 1/3 |
11/20 | winner | 0.2 | 1/3 |
11/20 | winner | 0.9 | 1/3 |
10/20 | loser | 0.6 | 1/3 |
10/20 | loser | 0.2 | 1/3 |
10/20 | loser | 0.9 | 1/3 |
11/20 | loser | 0.6 | 1/1 |
I thought of using the function count to count how many times the values are repeated, but then I need to specify that the "count" should be done per each date. Any ideas?
Upvotes: 2
Views: 616
Reputation: 4042
Updated answer:
@timgeb's required a groupby of just the month for me. To output a fraction instead of a decimal, I used the handy humanize library.
import humanize # pip install humanize # if needed
df['q'] = 1 / df.groupby(['month', 'outcome'])['month'].transform('count')
df['q'] = df['q'].apply(lambda x : humanize.fractional(x))
Note that you can't just use .count()
with the groupby - you need the transform method to return a Series of the same length as the original DataFrame.
Using Python 3.9.7, pandas 1.4.1
Code to make the original df (I left out the mom.ret column that wasn't germane).
import pandas as pd
df = pd.DataFrame(
{
"month": [
"10/20",
"10/20",
"11/20",
"11/20",
"11/20",
"10/20",
"10/20",
"10/20",
"11/20",
],
"outcome": [
"winner",
"winner",
"winner",
"winner",
"winner",
"loser",
"loser",
"loser",
"loser",
],
}
)
Upvotes: 0
Reputation: 78780
Use df['q'] = 1/df.groupby(['month', 'outcome']).transform('count')
.
Upvotes: 1
Reputation: 1996
You can use this code to achieve what you want, assuming your original DataFrame is called df
:
counts = df.groupby(['month', 'outcome'], as_index=False).count()
counts = counts.rename(columns={'mom.ret': 'q'})
# Use this line if you want the float value of the division 0.5
# counts['q'] = 1/counts['q']
# Use this line if you want the string '1/2'
counts['q'] = counts['q'].apply(lambda x: f'1/{x}')
result = pd.merge(df, counts)
The result looks like this:
month outcome mom.ret q
0 10/20 winner 0.2 1/2
1 10/20 winner 0.9 1/2
2 11/20 winner 0.6 1/3
3 11/20 winner 0.2 1/3
4 11/20 winner 0.9 1/3
5 10/20 loser 0.6 1/2
6 10/20 loser 0.2 1/2
7 11/20 loser 0.9 1/2
8 11/20 loser 0.6 1/2
Upvotes: 1