Reputation: 17
I have a dataframe created using the following data.
data = {'B1':[65, 41, 17, 7],
'B2':[28, 40, 47, 30],
'B3':[16, 29, 39, 44],
'L':[6, 6.5, 7, 7.5]
'U':[6.5, 7, 7.5, 8]}
df=pd.DataFrame(data)
I have another dataframe df1
as follows:
data1 = {'Value':[6, 6.5, 7, 7.5]}
df1=pd.DataFrame(data1)
For each row in df1
, I want to get the row from df
where the value is between the 'L'
and 'U'
values and get the column name of the maximum value of the other columns and add it as a column to df1
. For example, the output can be seen in the image below:
Image
Here is the code I used:
maxval=[]
for index,row in df1.iterrows():
tmp_val=df[df.columns[0:3]][(row.Value>= df.L) &(row.Value<df.U)]
maxval.append(tmp_val.idxmax(axis=1))
I then add this column to df1
. While this works, I have a dataframe that has 300,000 rows and it is taking a long time to do this. Are there any other ways to do this quicker?
Upvotes: 1
Views: 713
Reputation: 150735
IIUC, your df['U']
is df['L']
shifted by 1. If so, you can still use idxmax
but with a pd.cut
:
row_map = pd.cut(df1['Value'], bins = [df['L'][0]] + df['U'].to_list(),
right=False,
labels=df.index)
df1['max_col'] = df.loc[row_map,df.columns[:3]].idxmax(1)
Output:
Value max_col
0 6.0 B1
1 6.5 B1
2 7.0 B2
3 7.5 B3
Upvotes: 0