Rulli
Rulli

Reputation: 105

Pandas calculating percentages row based

I have my data frame looking like this:

import pandas as pd

ratings = {'rating': ['1','2','3','4', '5'],
        'F': [6,4,6,4,8],
        'M': [4,6,14,6,2]   
        }

df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])

print (df)

   rating  F   M
        1  6   4
        2  4   6
        3  6  14
        4  4   6
        5  8   2

What I would like to do is get two new columns F_percentage and M_percentage. I wish they would contain the percentage share of each number in the F and M column from the whole calculated per row. Meaning:

 rating      F      M   F_percentage     M_percentage
      1      6      4        60%            40%
      2      4      6        40%            60%
      3      6      14       ........
      4      4      6        ........
      5      8      2        80%            20%

I would like to calculate the percentage of the total per row.

Thank you in advance!

Upvotes: 0

Views: 59

Answers (3)

jezrael
jezrael

Reputation: 862511

If performance is important you can divide sum of columns to new DataFrame by DataFrame.divand add to original by join:

df1 = df[['F','M']]
df = df.join(df1.div(df1.sum(axis=1), axis=0).add_suffix('_percentage').mul(100))
print (df)
  rating  F   M  F_percentage  M_percentage
0      1  6   4          60.0          40.0
1      2  4   6          40.0          60.0
2      3  6  14          30.0          70.0
3      4  4   6          40.0          60.0
4      5  8   2          80.0          20.0

If need strings with % convert values to strings, remove possible .0 values and last add percentages:

df1 = df[['F','M']]
df = (df.join(df1.div(df1.sum(axis=1), axis=0)
                         .add_suffix('_percentage').mul(100)
                         .astype(int)
                         .astype(str)
                         .replace('\.0','', regex=True)
                         .add('%')))
print (df)
  rating  F   M F_percentage M_percentage
0      1  6   4          60%          40%
1      2  4   6          40%          60%
2      3  6  14          30%          70%
3      4  4   6          40%          60%
4      5  8   2          80%          20%

Upvotes: 1

Nikaido
Nikaido

Reputation: 4629

you can create a function that will do the trick and the apply that with the dataframe apply method

# female percentage
def f_percentage(row):
   tot = row['F'] + row['M']
   return str(int((row['F'] / tot) * 100))+'%'

df['F_percentage'] = df.apply(lambda row: f_percentage(row), axis=1)

# male percentage
def m_percentage(row):
    tot = row['F'] + row['M']
    return str(int((row['M'] / tot) * 100))+'%'

df['M_percentage'] = df.apply(lambda row: m_percentage(row), axis=1)

as pointed by other users, the apply method has performance issues. But if the dataframe is a small one, it is irrelevant. By the way it's important to keep in mind that, considering for example if the dataframe is going to increase in size in the near future

Upvotes: 1

Anan
Anan

Reputation: 181

Here's a complete solution for you -

import pandas as pd

percentage_F_list = []
percentage_M_list = []

ratings = {'rating': ['1','2','3','4', '5'],
        'F': [6,4,6,4,8],
        'M': [4,6,14,6,2]   
        }

df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])

print (df)


for i in range(df.shape[0]):
    tot = df['F'][i] + df['M'][i]
    percentage_F_list.append((df['F'][i])/tot * 100)
    percentage_M_list.append((df['M'][i])/tot * 100)
    
df['F_percentage'] = percentage_F_list
df['M_percentage'] = percentage_M_list

Upvotes: 1

Related Questions