Reputation: 765
heres the problem... Imagine the following dataframe as an example:
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [3, 4, 5, 6, 7],'col3': [3, 4, 5, 6, 7],'col4': [1, 2, 3, 3, 2]})
Now, I would like to add another column "col 5" which is calculated as follows:
if the value of "col4" is 1, then give me the corresponding value in the column with index 1 (i.e. "col2" in this case), if "col4" is 2 give me the corresponding value in the column with index 2 (i.e. "col3" in this case), etc.
I have tried the below and variations of it, but I can't seem to get the right result
df["col5"] = df.apply(lambda x: df.iloc[x,df[df.columns[df["col4"]]]])
Any help is much appreciated!
Upvotes: 1
Views: 237
Reputation: 164623
You can use fancy indexing with NumPy and avoid a Python-level loop altogether:
df['col5'] = df.iloc[:, :4].values[np.arange(df.shape[0]), df['col4']]
print(df)
col1 col2 col3 col4 col5
0 1 3 3 1 3
1 2 4 4 2 4
2 3 5 5 3 3
3 4 6 6 3 3
4 5 7 7 2 7
You should see significant performance benefits for larger dataframes:
df = pd.concat([df]*10**4, ignore_index=True)
%timeit df.apply(lambda x: x[df.columns[x['col4']]], axis=1) # 2.36 s per loop
%timeit df.iloc[:, :4].values[np.arange(df.shape[0]), df['col4']] # 1.01 ms per loop
Upvotes: 1
Reputation: 27869
If your 'col4'
is the indicator of column index, this will work:
df['col5'] = df.apply(lambda x: x[df.columns[x['col4']]], axis=1)
df
# col1 col2 col3 col4 col5
#0 1 3 3 1 3
#1 2 4 4 2 4
#2 3 5 5 3 3
#3 4 6 6 3 3
#4 5 7 7 2 7
Upvotes: 2