Reputation: 43
I'm looking for a quick (vectorized) way to perform calculations using the contents of a Pandas dataframe.
My dataframe contains 2 labels for each row and I want to look up values corresponding to each label (from a dictionary / list) and perform a calculation, returning the result to a new column in the dataframe.
I include my working example below making use of loops.
label1s = np.array(['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], dtype=str)
label2s = np.array(['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'], dtype=str)
data = np.column_stack([label1s, label2s])
label_values = {'A':1, 'B':2, 'C':3}
df = pd.DataFrame(data=data, columns=['Label1', 'Label2'])
new_col = np.zeros_like(label1s, dtype=float)
for index, row in df.iterrows():
val1 = label_values[row['Label1']]
val2 = label_values[row['Label2']]
new_col[index] = val1 - val2
df['result'] = new_col
df
However, for large datasets, the loop is highly undesirable and slow.
Is there a way to optimize this please?
I've explored some of the pandas functionality like "Lookup", but this seems to want each sized arrays, whereas in my case, I need to lookup values from a list external and different sized to the dataframe.
Upvotes: 3
Views: 1077