labalala
labalala

Reputation: 17

Adding a column based on the values from another dataframe pandas

I have a dataframe created using the following data.

data = {'B1':[65, 41, 17, 7], 
        'B2':[28, 40, 47, 30],
        'B3':[16, 29, 39, 44],
        'L':[6, 6.5, 7, 7.5]
        'U':[6.5, 7, 7.5, 8]}
df=pd.DataFrame(data)

I have another dataframe df1 as follows:

data1 = {'Value':[6, 6.5, 7, 7.5]}
df1=pd.DataFrame(data1)

For each row in df1, I want to get the row from dfwhere the value is between the 'L' and 'U' values and get the column name of the maximum value of the other columns and add it as a column to df1. For example, the output can be seen in the image below: Image

Here is the code I used:

maxval=[]
for index,row in df1.iterrows():
   tmp_val=df[df.columns[0:3]][(row.Value>= df.L) &(row.Value<df.U)]
   maxval.append(tmp_val.idxmax(axis=1))

I then add this column to df1. While this works, I have a dataframe that has 300,000 rows and it is taking a long time to do this. Are there any other ways to do this quicker?

Upvotes: 1

Views: 713

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

IIUC, your df['U'] is df['L'] shifted by 1. If so, you can still use idxmax but with a pd.cut:

row_map = pd.cut(df1['Value'], bins = [df['L'][0]] + df['U'].to_list(),
                 right=False,
                 labels=df.index)

df1['max_col'] = df.loc[row_map,df.columns[:3]].idxmax(1)

Output:

   Value max_col
0    6.0      B1
1    6.5      B1
2    7.0      B2
3    7.5      B3

Upvotes: 0

yatu
yatu

Reputation: 88226

Using idxmax:

df1.assign(max_col=df.idxmax(1))

    Value max_col
0    6.0      B1
1    6.5      B1
2    7.0      B2
3    7.5      B3

We can speed it up a little more working with the underlying numpy arrays and using argmax:

df1.assign(max_col=df.columns[df.values.argmax(1)])

Upvotes: 1

Related Questions