How to find accumulated percentages for dataframe rows?

Question

I've got a dataframe of the form:

   2021  2022  2023
0     3     7     7
1     1     4     4
2     0     1     5
3     4     5     7

Now I'd like to find the accumulated percentages calculated relative to the last column (2023) across each row so that I'll end up with this:

    2021    2022   2023
0  42.86  100.00  100.0
1  25.00  100.00  100.0
2   0.00   20.00  100.0
3  57.14   71.43  100.0

I am able to obtain the desired output using:

data = []
colnames= list(df.columns)
for row in df.iterrows():
    data.append([elem/row[1][-1]*100 for elem in row][1].values)
df_acc = pd.DataFrame(data)
df_acc.columns = colnames

But this seems horribly inefficient, and I'll have to go through the steps of iterating over all rows, use a list comprehension to find the percentages using [elem/row[1][-1]*100 for elem in row][1].values, and then build a new dataframe.

Does anyone know of a better approach? Perhaps even one that uses inplace=True?

Complete code with data sample:

import pandas as pd
import numpy as np

# data
np.random.seed(1)
start = 2021
ncols = 3
nrows = 4
cols = [str(i) for i in np.arange(start, start+ncols)]
df = pd.DataFrame(np.random.randint(0,5, (nrows,ncols)), columns = cols).cumsum(axis = 1)

data = []
colnames= list(df.columns)
for row in df.iterrows():
    data.append([round(elem/row[1][-1]*100, 2) for elem in row][1].values)
#     data.append([elem/row[1][-1]*100 for elem in row][1].values)
df_acc = pd.DataFrame(data)
df_acc.columns = colnames
df_acc

Sayandip Dutta · Accepted Answer

You can df.div by last_column, then multiply by 100 and round 2 decimal points:

>>> df.div(df.iloc[:,-1], axis=0).mul(100).round(2)
    2021    2022   2023
0  42.86  100.00  100.0
1  25.00  100.00  100.0
2   0.00   20.00  100.0
3  57.14   71.43  100.0

If you want percentage based on max value of each column:

>>> df.div(df.max(1), axis=0).mul(100).round(2)
    2021    2022   2023
0  42.86  100.00  100.0
1  25.00  100.00  100.0
2   0.00   20.00  100.0
3  57.14   71.43  100.0

How to find accumulated percentages for dataframe rows?

Complete code with data sample:

Answers (1)

Related Questions