Reputation: 7679
How to add another column to Pandas' DataFrame with percentage? The dict can change on size.
>>> import pandas as pd
>>> a = {'Test 1': 4, 'Test 2': 1, 'Test 3': 1, 'Test 4': 9}
>>> p = pd.DataFrame(a.items())
>>> p
0 1
0 Test 2 1
1 Test 3 1
2 Test 1 4
3 Test 4 9
[4 rows x 2 columns]
Upvotes: 18
Views: 148257
Reputation: 41
I take the following approach when exploring model training data.
In [1]: import pandas as pd
In [2]: d = {"set1":[59268, 6166, 115], "set2":[12700, 9892, 238]}
...: idx_labels = ["Train", "Validation", "Test"]
...: df = pd.DataFrame(data=d, index=idx_labels)
In [3]: df
Out[3]:
set1 set2
Train 59268 12700
Validation 6166 9892
Test 115 238
In [4]: def compute_ratio(df, target, num_decimal: int = 4) -> pd.Series:
...: if target in df.columns:
...: divider = df.loc[:, :].T.sum()
...: pct = df.loc[:, target] / divider
...: elif target in df.index:
...: divider = df.loc[:, :].sum()
...: pct = df.loc[target, :] / divider
...: return round(pct, num_decimal)
...:
In [5]: df["set1_ratio"] = compute_ratio(df, "set1", num_decimal=5)
In [6]: df["set2_ratio"] = compute_ratio(df, "set2", num_decimal=5)
In [7]: df
Out[7]:
set1 set2 set1_ratio set2_ratio
Train 59268 12700 0.82353 0.17647
Validation 6166 9892 0.38398 0.61600
Test 115 238 0.32578 0.67360
In [8]: df_rows = pd.DataFrame()
In [9]: df_rows["Train_ratio"] = compute_ratio(df, "Train", num_decimal=5)
In [10]: df_rows["Validation_ratio"] = compute_ratio(df, "Validation", num_decimal=5)
In [11]: df_rows["Test_ratio"] = compute_ratio(df, "Test", num_decimal=5)
In [12]: df_rows
Out[12]:
Train_ratio Validation_ratio Test_ratio
set1 0.90418 0.09407 0.00175
set2 0.55629 0.43329 0.01042
set1_ratio 0.53710 0.25043 0.21247
set2_ratio 0.12037 0.42017 0.45946
Upvotes: 0
Reputation: 11
import pandas as pd
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
# calculate percentage using apply() method and lambda function
df['B_Percentage'] = df['B'].apply(lambda x: (x / df['B'].sum()) * 100)
print(df)
using lambda can be useful. can be done by more methods. maybe this will help http://www.pythonpandas.com/how-to-calculate-the-percentage-of-a-column-in-pandas/
Upvotes: 1
Reputation: 11
df=pd.read_excel("regional cases.xlsx")
df.head()
REGION CUMILATIVECOUNTS POPULATION
GREATER 12948 4943075
ASHANTI 4972 5792187
WESTERN 2051 2165241
CENTRAL 1071 2563228
df['Percentage']=round((df['CUMILATIVE COUNTS']/ df['POPULATION']*100)*100,2)
df.head()
REGION CUMILATIVECOUNTS POPULATION Percentage
GREATER 12948 4943075 26.19
ASHANTI 4972 5792187 8.58
WESTERN 2051 2165241 9.47
Upvotes: 0
Reputation: 1216
First, make the keys of your dictionary the index of you dataframe:
import pandas as pd
a = {'Test 1': 4, 'Test 2': 1, 'Test 3': 1, 'Test 4': 9}
p = pd.DataFrame([a])
p = p.T # transform
p.columns = ['score']
Then, compute the percentage and assign to a new column.
def compute_percentage(x):
pct = float(x/p['score'].sum()) * 100
return round(pct, 2)
p['percentage'] = p.apply(compute_percentage, axis=1)
This gives you:
score percentage
Test 1 4 26.67
Test 2 1 6.67
Test 3 1 6.67
Test 4 9 60.00
[4 rows x 2 columns]
Upvotes: 7
Reputation: 16488
If indeed percentage of 10
is what you want, the simplest way is to adjust your intake of the data slightly:
>>> p = pd.DataFrame(a.items(), columns=['item', 'score'])
>>> p['perc'] = p['score']/10
>>> p
Out[370]:
item score perc
0 Test 2 1 0.1
1 Test 3 1 0.1
2 Test 1 4 0.4
3 Test 4 9 0.9
For real percentages, instead:
>>> p['perc']= p['score']/p['score'].sum()
>>> p
Out[427]:
item score perc
0 Test 2 1 0.066667
1 Test 3 1 0.066667
2 Test 1 4 0.266667
3 Test 4 9 0.600000
Upvotes: 44