Reputation: 173
I am working with a large multiIndex DataFrame results_matrix
with several columns. I need to access each element of the columns Results1
and Results2
and select one of those elements based on an indicator variable that is stored in the column Indicator
of the DataFrame.
Currently, I am accessing each of the elements in a for loop - however, this increases the calculation time of the program quite a lot. Is there a way to do this more efficiently?
import pandas as pd
import numpy as np
selected_results = pd.Series(np.nan)
# Used to iterate through the rows of the DataFrame
i = 0
for items in results_matrix['Indicator']:
if results_matrix.iloc[i]['Indicator'] == 1:
selected_results[i] = results_matrix.iloc[i]['Results1']
else:
selected_results[i] = results_matrix.iloc[i]['Results2']
i += 1
results_matrix['SelectedResults'] = selected_results.values
Upvotes: 1
Views: 36
Reputation: 862591
I think you need numpy.where
:
results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1,
results_matrix['Results1'],
results_matrix['Results2'])
Sample:
results_matrix = pd.DataFrame({'Indicator':[1,2,3],
'Results1':[4,5,6],
'Results2':[7,8,9]})
print (results_matrix)
Indicator Results1 Results2
0 1 4 7
1 2 5 8
2 3 6 9
results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1,
results_matrix['Results1'],
results_matrix['Results2'])
print (results_matrix)
Indicator Results1 Results2 SelectedResults
0 1 4 7 4
1 2 5 8 8
2 3 6 9 9
Upvotes: 2