Reputation: 105
I have my data frame looking like this:
import pandas as pd
ratings = {'rating': ['1','2','3','4', '5'],
'F': [6,4,6,4,8],
'M': [4,6,14,6,2]
}
df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])
print (df)
rating F M
1 6 4
2 4 6
3 6 14
4 4 6
5 8 2
What I would like to do is get two new columns F_percentage and M_percentage. I wish they would contain the percentage share of each number in the F and M column from the whole calculated per row. Meaning:
rating F M F_percentage M_percentage
1 6 4 60% 40%
2 4 6 40% 60%
3 6 14 ........
4 4 6 ........
5 8 2 80% 20%
I would like to calculate the percentage of the total per row.
Thank you in advance!
Upvotes: 0
Views: 59
Reputation: 862511
If performance is important you can divide sum of columns to new DataFrame by DataFrame.div
and add to original by join
:
df1 = df[['F','M']]
df = df.join(df1.div(df1.sum(axis=1), axis=0).add_suffix('_percentage').mul(100))
print (df)
rating F M F_percentage M_percentage
0 1 6 4 60.0 40.0
1 2 4 6 40.0 60.0
2 3 6 14 30.0 70.0
3 4 4 6 40.0 60.0
4 5 8 2 80.0 20.0
If need strings with %
convert values to strings, remove possible .0
values and last add percentages:
df1 = df[['F','M']]
df = (df.join(df1.div(df1.sum(axis=1), axis=0)
.add_suffix('_percentage').mul(100)
.astype(int)
.astype(str)
.replace('\.0','', regex=True)
.add('%')))
print (df)
rating F M F_percentage M_percentage
0 1 6 4 60% 40%
1 2 4 6 40% 60%
2 3 6 14 30% 70%
3 4 4 6 40% 60%
4 5 8 2 80% 20%
Upvotes: 1
Reputation: 4629
you can create a function that will do the trick and the apply that with the dataframe apply
method
# female percentage
def f_percentage(row):
tot = row['F'] + row['M']
return str(int((row['F'] / tot) * 100))+'%'
df['F_percentage'] = df.apply(lambda row: f_percentage(row), axis=1)
# male percentage
def m_percentage(row):
tot = row['F'] + row['M']
return str(int((row['M'] / tot) * 100))+'%'
df['M_percentage'] = df.apply(lambda row: m_percentage(row), axis=1)
as pointed by other users, the apply
method has performance issues. But if the dataframe is a small one, it is irrelevant. By the way it's important to keep in mind that, considering for example if the dataframe is going to increase in size in the near future
Upvotes: 1
Reputation: 181
Here's a complete solution for you -
import pandas as pd
percentage_F_list = []
percentage_M_list = []
ratings = {'rating': ['1','2','3','4', '5'],
'F': [6,4,6,4,8],
'M': [4,6,14,6,2]
}
df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])
print (df)
for i in range(df.shape[0]):
tot = df['F'][i] + df['M'][i]
percentage_F_list.append((df['F'][i])/tot * 100)
percentage_M_list.append((df['M'][i])/tot * 100)
df['F_percentage'] = percentage_F_list
df['M_percentage'] = percentage_M_list
Upvotes: 1