Reputation: 4122

Pandas: compute numerous columns of percentage values

I'm failing to loop through the values of select dataframe columns in order to create new columns representing percentage values. Reproducible example:

    data = {'Respondents': [90, 43, 89, '89', '67', '88', '73', '78', '62', '101'],
        'answer_1': [51, 15, 15, 61, 16, 14, 15, 1, 0, 16], 
        'answer_2': [11, 12, 14, 40, 36, 78, 12, 0, 26, 78],
        'answer_3': [3, 8, 4, 0, 2, 7, 10, 11, 6, 7]}
df = pd.DataFrame(data)
df

    Respondents  answer_1   answer_2   answer_3
0   90           51         11         3
1   43           15         12         8
2   89           15         14         4
3   89           61         35         0
4   67           16         36         2
5   88           14         78         7
6   73           15         12         10
7   78           1          0          11
8   62           0          26         6
9   101          16         78         7

The aim is to compute the percentage for each of the answers columns against the total respondents. For example, for the new answer_1 column – let's name it answer_1_perc – the first value would be 46 (because 51 is 46% of 90), the next value would be 35 (15 is 35% of 43). Then there would be answer_2_perc and answer_3_perc columns.

I've written so many iterations of the following code my head's spinning.

for columns in df.iloc[:, 1:4]:
for i in columns: 
    i_name = 'percentage_' + str(columns)
    i_group = ([i] / df['Respondents'] * 100)
    df[i_name] = i_group

What is the best way to do this? I need to use an iterative method as my actual data has 25 answer columns rather than the 3 shown in this example.

Upvotes: 2

Answers (3)

Kamil Sindi

Reputation: 22832

I recommend using div and concat:

df['Respondents'] = df['Respondents'].astype(float)
df_pct = (df.drop('Respondents', axis=1)
            .div(df['Respondents'], axis=0)
            .mul(100)
            .rename(columns=lambda col: 'percentage_' + col)
          )
pd.concat([df, df_pct], axis=1)

   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0         90.0        51        11         3            56.666667   
1         43.0        15        12         8            34.883721   
2         89.0        15        14         4            16.853933   
3         89.0        61        40         0            68.539326   
4         67.0        16        36         2            23.880597   
5         88.0        14        78         7            15.909091   
6         73.0        15        12        10            20.547945   
7         78.0         1         0        11             1.282051   
8         62.0         0        26         6             0.000000   
9        101.0        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693

Upvotes: 2

jezrael

Reputation: 862921

Another solution with div desired columns by column Respondents and then add to new columns names:

print  ('percentage_' + df.columns[1:4])
Index(['percentage_answer_1', 'percentage_answer_2', 'percentage_answer_3'], dtype='object')

df['percentage_' + df.columns[1:4]] = df.ix[:,1:4].div(df.Respondents, axis=0) * 100
print (df)
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693

Upvotes: 0

EdChum

Reputation: 394099

You almost had it, note that you have string values in respondents col which I've corrected prior to calling the following:

In [172]:

for col in df.columns[1:4]:
    i_name = 'percentage_' + col
    i_group = (df[col] / df['Respondents']) * 100
    df[i_name] = i_group

df
Out[172]:
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693

Upvotes: 4

Pandas: compute numerous columns of percentage values

Answers (3)

Related Questions