user977828
user977828

Reputation: 7679

How to calculate percentage with Pandas' DataFrame

How to add another column to Pandas' DataFrame with percentage? The dict can change on size.

>>> import pandas as pd
>>> a = {'Test 1': 4, 'Test 2': 1, 'Test 3': 1, 'Test 4': 9}
>>> p = pd.DataFrame(a.items())
>>> p
        0  1
0  Test 2  1
1  Test 3  1
2  Test 1  4
3  Test 4  9

[4 rows x 2 columns]

Upvotes: 18

Views: 148257

Answers (5)

Luis G Riera
Luis G Riera

Reputation: 41

I take the following approach when exploring model training data.

In [1]: import pandas as pd

In [2]: d = {"set1":[59268, 6166, 115], "set2":[12700, 9892, 238]}
   ...: idx_labels = ["Train", "Validation", "Test"]
   ...: df = pd.DataFrame(data=d, index=idx_labels)

In [3]: df
Out[3]:
             set1   set2
Train       59268  12700
Validation   6166   9892
Test          115    238

In [4]: def compute_ratio(df, target, num_decimal: int = 4) -> pd.Series:
   ...:     if target in df.columns:
   ...:         divider = df.loc[:, :].T.sum()
   ...:         pct = df.loc[:, target] / divider
   ...:     elif target in df.index:
   ...:         divider = df.loc[:, :].sum()
   ...:         pct = df.loc[target, :] / divider
   ...:     return round(pct, num_decimal)
   ...:

In [5]: df["set1_ratio"] = compute_ratio(df, "set1", num_decimal=5)

In [6]: df["set2_ratio"] = compute_ratio(df, "set2", num_decimal=5)

In [7]: df
Out[7]:
             set1   set2  set1_ratio  set2_ratio
Train       59268  12700     0.82353     0.17647
Validation   6166   9892     0.38398     0.61600
Test          115    238     0.32578     0.67360

In [8]: df_rows = pd.DataFrame()

In [9]: df_rows["Train_ratio"] =  compute_ratio(df, "Train", num_decimal=5)

In [10]: df_rows["Validation_ratio"] =  compute_ratio(df, "Validation", num_decimal=5)

In [11]: df_rows["Test_ratio"] =  compute_ratio(df, "Test", num_decimal=5)

In [12]: df_rows
Out[12]:
            Train_ratio  Validation_ratio  Test_ratio
set1            0.90418           0.09407     0.00175
set2            0.55629           0.43329     0.01042
set1_ratio      0.53710           0.25043     0.21247
set2_ratio      0.12037           0.42017     0.45946

Upvotes: 0

Shivam
Shivam

Reputation: 11

import pandas as pd
 
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# calculate percentage using apply() method and lambda function
 
df['B_Percentage'] = df['B'].apply(lambda x: (x / df['B'].sum()) * 100)
 
print(df)

using lambda can be useful. can be done by more methods. maybe this will help http://www.pythonpandas.com/how-to-calculate-the-percentage-of-a-column-in-pandas/

Upvotes: 1

Charis Baafi
Charis Baafi

Reputation: 11

df=pd.read_excel("regional cases.xlsx")
df.head()

REGION  CUMILATIVECOUNTS    POPULATION

GREATER         12948       4943075
ASHANTI         4972        5792187
WESTERN         2051        2165241
CENTRAL         1071        2563228



df['Percentage']=round((df['CUMILATIVE COUNTS']/ df['POPULATION']*100)*100,2)
df.head()



REGION  CUMILATIVECOUNTS    POPULATION  Percentage

GREATER 12948               4943075      26.19
ASHANTI 4972                5792187      8.58
WESTERN 2051                2165241      9.47

Upvotes: 0

joemar.ct
joemar.ct

Reputation: 1216

First, make the keys of your dictionary the index of you dataframe:

 import pandas as pd
 a = {'Test 1': 4, 'Test 2': 1, 'Test 3': 1, 'Test 4': 9}
 p = pd.DataFrame([a])
 p = p.T # transform
 p.columns = ['score']

Then, compute the percentage and assign to a new column.

 def compute_percentage(x):
      pct = float(x/p['score'].sum()) * 100
      return round(pct, 2)

 p['percentage'] = p.apply(compute_percentage, axis=1)

This gives you:

         score  percentage
 Test 1      4   26.67
 Test 2      1    6.67
 Test 3      1    6.67
 Test 4      9   60.00

 [4 rows x 2 columns]

Upvotes: 7

FooBar
FooBar

Reputation: 16488

If indeed percentage of 10 is what you want, the simplest way is to adjust your intake of the data slightly:

>>> p = pd.DataFrame(a.items(), columns=['item', 'score'])
>>> p['perc'] = p['score']/10
>>> p
Out[370]: 
     item  score  perc
0  Test 2      1   0.1
1  Test 3      1   0.1
2  Test 1      4   0.4
3  Test 4      9   0.9

For real percentages, instead:

>>> p['perc']= p['score']/p['score'].sum()
>>> p
Out[427]: 
     item  score      perc
0  Test 2      1  0.066667
1  Test 3      1  0.066667
2  Test 1      4  0.266667
3  Test 4      9  0.600000

Upvotes: 44

Related Questions