bit_scientist
bit_scientist

Reputation: 1481

how to map two dataframes on condition while having different rows

I have two dataframes that need to be mapped (or joined?) based on some condition. These are the dataframes:

df_1

      img_names   img_array
0         1_rel         253
1   1_rel_right         255
2     1_rel_top         250
3         4_rel         180
4   4_rel_right         182
5     4_rel_top         189
6         7_rel         217
7   7_rel_right         183
8     7_rel_top         196

df_2

  List_No    time
0       1      38
1       4      23
2       7      32

After mapping I would like to get the following dataframe:

df_3

      img_names   img_array    List_No    time   
0         1_rel         253          1      38
1   1_rel_right         255          1      38
2     1_rel_top         250          1      38
3         4_rel         180          4      23
4   4_rel_right         182          4      23
5     4_rel_top         189          4      23
6         7_rel         217          7      32
7   7_rel_right         183          7      32
8     7_rel_top         196          7      32

Basically, df_2's each row is populated 3 times to match the number of rows in df_1 and the mapping (if we can say so) is done by the split string in each row of df_1's img_name column. The names of row elements in img_names may have different names, but each of them always starts with the some number (1,4,7 in this case) and an undescore, etc. So I need to split the correspongding number in each row and map it with the row elements of List_No.

I hope the example above is clear.

Thank you.

Upvotes: 0

Views: 197

Answers (2)

sammywemmy
sammywemmy

Reputation: 28709

An alternative to @QuangHoang's answer (which I believe you should pick, as it is more robust). This uses the map method, and assumes every value in df2's time is in df1:

df1.assign(
    List_No=df1.img_names.str.extract(r"(\d)", expand=False).astype(int),
    time=lambda x: x.List_No.map(df2["time"]),
)


   img_names    img_array   List_No time
0   1_rel            253    1       38
1   1_rel_right      255    1       38
2   1_rel_top        250    1       38
3   4_rel            180    4       23
4   4_rel_right      182    4       23
5   4_rel_top        189    4       23
6   7_rel            217    7       32
7   7_rel_right      183    7       32
8   7_rel_top        196    7       32

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150785

Looks like you can just extract the digit parts and merge:

df_1['List_No'] = df_1['img_names'].str.split('_').str[0].astype(int)
df_3 = df_1.merge(df_2, on='List_No')

Output:

     img_names  img_array  List_No  time
0        1_rel        253        1    38
1  1_rel_right        255        1    38
2    1_rel_top        250        1    38
3        4_rel        180        4    23
4  4_rel_right        182        4    23
5    4_rel_top        189        4    23
6        7_rel        217        7    32
7  7_rel_right        183        7    32
8    7_rel_top        196        7    32

Upvotes: 3

Related Questions