Reputation: 61094
I've got a dataframe of the form:
2021 2022 2023
0 3 7 7
1 1 4 4
2 0 1 5
3 4 5 7
Now I'd like to find the accumulated percentages calculated relative to the last column (2023) across each row so that I'll end up with this:
2021 2022 2023
0 42.86 100.00 100.0
1 25.00 100.00 100.0
2 0.00 20.00 100.0
3 57.14 71.43 100.0
I am able to obtain the desired output using:
data = []
colnames= list(df.columns)
for row in df.iterrows():
data.append([elem/row[1][-1]*100 for elem in row][1].values)
df_acc = pd.DataFrame(data)
df_acc.columns = colnames
But this seems horribly inefficient, and I'll have to go through the steps of iterating over all rows, use a list comprehension to find the percentages using [elem/row[1][-1]*100 for elem in row][1].values
, and then build a new dataframe.
Does anyone know of a better approach? Perhaps even one that uses inplace=True
?
import pandas as pd
import numpy as np
# data
np.random.seed(1)
start = 2021
ncols = 3
nrows = 4
cols = [str(i) for i in np.arange(start, start+ncols)]
df = pd.DataFrame(np.random.randint(0,5, (nrows,ncols)), columns = cols).cumsum(axis = 1)
data = []
colnames= list(df.columns)
for row in df.iterrows():
data.append([round(elem/row[1][-1]*100, 2) for elem in row][1].values)
# data.append([elem/row[1][-1]*100 for elem in row][1].values)
df_acc = pd.DataFrame(data)
df_acc.columns = colnames
df_acc
Upvotes: 1
Views: 51
Reputation: 15872
You can df.div
by last_column, then mul
tiply by 100
and round
2
decimal points:
>>> df.div(df.iloc[:,-1], axis=0).mul(100).round(2)
2021 2022 2023
0 42.86 100.00 100.0
1 25.00 100.00 100.0
2 0.00 20.00 100.0
3 57.14 71.43 100.0
If you want percentage based on max
value of each column:
>>> df.div(df.max(1), axis=0).mul(100).round(2)
2021 2022 2023
0 42.86 100.00 100.0
1 25.00 100.00 100.0
2 0.00 20.00 100.0
3 57.14 71.43 100.0
Upvotes: 1