Tomasz
Tomasz

Reputation: 568

Find values from other dataframe and assign to original dataframe

Having input dataframe:

    x_1  x_2
0  0.0  0.0
1  1.0  0.0
2  2.0  0.2
3  2.5  1.5
4  1.5  2.0
5 -2.0 -2.0

and additional dataframe as follows:

    index  x_1_x  x_2_x  x_1_y  x_2_y  value      dist  dist_rank
0       0    0.0    0.0    0.1    0.1    5.0  0.141421        2.0
4       0    0.0    0.0    1.5    1.0   -2.0  1.802776        3.0
5       0    0.0    0.0    0.0    0.0    3.0  0.000000        1.0
9       1    1.0    0.0    0.1    0.1    5.0  0.905539        1.0
11      1    1.0    0.0    2.0    0.4    3.0  1.077033        3.0
14      1    1.0    0.0    0.0    0.0    3.0  1.000000        2.0
18      2    2.0    0.2    0.1    0.1    5.0  1.902630        3.0
20      2    2.0    0.2    2.0    0.4    3.0  0.200000        1.0
22      2    2.0    0.2    1.5    1.0   -2.0  0.943398        2.0
29      3    2.5    1.5    2.0    0.4    3.0  1.208305        3.0
30      3    2.5    1.5    2.5    2.5    4.0  1.000000        1.0
31      3    2.5    1.5    1.5    1.0   -2.0  1.118034        2.0
38      4    1.5    2.0    2.0    0.4    3.0  1.676305        3.0
39      4    1.5    2.0    2.5    2.5    4.0  1.118034        2.0
40      4    1.5    2.0    1.5    1.0   -2.0  1.000000        1.0
45      5   -2.0   -2.0    0.1    0.1    5.0  2.969848        2.0
46      5   -2.0   -2.0    1.0   -2.0    6.0  3.000000        3.0
50      5   -2.0   -2.0    0.0    0.0    3.0  2.828427        1.0

I want to create new columns in input dataframe, basing on additional dataframe with respect to dist_rank. It should extract x_1_y, x_2_y and value for each row, with respect to index and dist_rank so my expected output is following: expected output

I tried following lines:

df['value_dist_rank1']=result.loc[result['dist_rank']==1.0, 'value']
df['value_dist_rank1 ']=result[result['dist_rank']==1.0]['value']

but both gave the same output:

   x_1  x_2  value_dist_rank1
0  0.0  0.0               NaN
1  1.0  0.0               NaN
2  2.0  0.2               NaN
3  2.5  1.5               NaN
4  1.5  2.0               NaN
5 -2.0 -2.0               3.0

Upvotes: 1

Views: 73

Answers (1)

vlemaistre
vlemaistre

Reputation: 3331

Here is a way to do it :

(For the sake of clarity I consider the input df as df1 and the additional df as df2)

# First we goupby df2 by index to get all the column information of each index on one line
df2 = df2.groupby('index').agg(lambda x: list(x)).reset_index()

# Then we explode each column into three columns since there is always three columns for each index
columns = ['dist_rank', 'value', 'x_1_y', 'x_2_y']
column_to_add = ['value', 'x_1_y', 'x_2_y']
for index, row in df2.iterrows():
    for i in range(3):
        column_names = ["{}_dist_rank{}".format(x, row.dist_rank[i])[:-2] for x in column_to_add]
        values = [row[x][i] for x in column_to_add]
        for column, value in zip(column_names, values):
            df2.loc[index, column] = value
# We drop the columns that are not useful :
df2.drop(columns=columns+['dist', 'x_1_x', 'x_2_x'], inplace = True)

# Finally we merge the modified df with our initial dataframe :
result = df1.merge(df2, left_index=True, right_on='index', how='left')

Output :

    x_1   x_2  index value_dist_rank2 x_1_y_dist_rank2 x_2_y_dist_rank2  \
0   0.0   0.0      0              5.0              0.1              0.1   
1   1.0   0.0      1              3.0              0.0              0.0   
2   2.0   0.2      2             -2.0              1.5              1.0   
3   2.5   1.5      3             -2.0              1.5              1.0   
4   1.5   2.0      4              4.0              2.5              2.5   
5  -2.0  -2.0      5              5.0              0.1              0.1   

  value_dist_rank3 x_1_y_dist_rank3 x_2_y_dist_rank3 value_dist_rank1  \
0             -2.0              1.5              1.0              3.0   
1              3.0              2.0              0.4              5.0   
2              5.0              0.1              0.1              3.0   
3              3.0              2.0              0.4              4.0   
4              3.0              2.0              0.4             -2.0   
5              6.0              1.0             -2.0              3.0   

  x_1_y_dist_rank1 x_2_y_dist_rank1  
0              0.0              0.0  
1              0.1              0.1  
2              2.0              0.4  
3              2.5              2.5  
4              1.5              1.0  
5              0.0              0.0  

Upvotes: 1

Related Questions