Reputation: 4122
I'm failing to loop through the values of select dataframe columns in order to create new columns representing percentage values. Reproducible example:
data = {'Respondents': [90, 43, 89, '89', '67', '88', '73', '78', '62', '101'],
'answer_1': [51, 15, 15, 61, 16, 14, 15, 1, 0, 16],
'answer_2': [11, 12, 14, 40, 36, 78, 12, 0, 26, 78],
'answer_3': [3, 8, 4, 0, 2, 7, 10, 11, 6, 7]}
df = pd.DataFrame(data)
df
Respondents answer_1 answer_2 answer_3
0 90 51 11 3
1 43 15 12 8
2 89 15 14 4
3 89 61 35 0
4 67 16 36 2
5 88 14 78 7
6 73 15 12 10
7 78 1 0 11
8 62 0 26 6
9 101 16 78 7
The aim is to compute the percentage for each of the answers columns against the total respondents. For example, for the new answer_1
column – let's name it answer_1_perc
– the first value would be 46 (because 51 is 46% of 90), the next value would be 35 (15 is 35% of 43). Then there would be answer_2_perc
and answer_3_perc
columns.
I've written so many iterations of the following code my head's spinning.
for columns in df.iloc[:, 1:4]:
for i in columns:
i_name = 'percentage_' + str(columns)
i_group = ([i] / df['Respondents'] * 100)
df[i_name] = i_group
What is the best way to do this? I need to use an iterative method as my actual data has 25 answer columns rather than the 3 shown in this example.
Upvotes: 2
Views: 339
Reputation: 22832
I recommend using div and concat:
df['Respondents'] = df['Respondents'].astype(float)
df_pct = (df.drop('Respondents', axis=1)
.div(df['Respondents'], axis=0)
.mul(100)
.rename(columns=lambda col: 'percentage_' + col)
)
pd.concat([df, df_pct], axis=1)
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90.0 51 11 3 56.666667
1 43.0 15 12 8 34.883721
2 89.0 15 14 4 16.853933
3 89.0 61 40 0 68.539326
4 67.0 16 36 2 23.880597
5 88.0 14 78 7 15.909091
6 73.0 15 12 10 20.547945
7 78.0 1 0 11 1.282051
8 62.0 0 26 6 0.000000
9 101.0 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
Upvotes: 2
Reputation: 862921
Another solution with div
desired columns by column Respondents
and then add to new columns names:
print ('percentage_' + df.columns[1:4])
Index(['percentage_answer_1', 'percentage_answer_2', 'percentage_answer_3'], dtype='object')
df['percentage_' + df.columns[1:4]] = df.ix[:,1:4].div(df.Respondents, axis=0) * 100
print (df)
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90 51 11 3 56.666667
1 43 15 12 8 34.883721
2 89 15 14 4 16.853933
3 89 61 40 0 68.539326
4 67 16 36 2 23.880597
5 88 14 78 7 15.909091
6 73 15 12 10 20.547945
7 78 1 0 11 1.282051
8 62 0 26 6 0.000000
9 101 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
Upvotes: 0
Reputation: 394099
You almost had it, note that you have string values in respondents col which I've corrected prior to calling the following:
In [172]:
for col in df.columns[1:4]:
i_name = 'percentage_' + col
i_group = (df[col] / df['Respondents']) * 100
df[i_name] = i_group
df
Out[172]:
Respondents answer_1 answer_2 answer_3 percentage_answer_1 \
0 90 51 11 3 56.666667
1 43 15 12 8 34.883721
2 89 15 14 4 16.853933
3 89 61 40 0 68.539326
4 67 16 36 2 23.880597
5 88 14 78 7 15.909091
6 73 15 12 10 20.547945
7 78 1 0 11 1.282051
8 62 0 26 6 0.000000
9 101 16 78 7 15.841584
percentage_answer_2 percentage_answer_3
0 12.222222 3.333333
1 27.906977 18.604651
2 15.730337 4.494382
3 44.943820 0.000000
4 53.731343 2.985075
5 88.636364 7.954545
6 16.438356 13.698630
7 0.000000 14.102564
8 41.935484 9.677419
9 77.227723 6.930693
Upvotes: 4