How to merge two dataframes according to their indexes?

Question

I am trying to use Pandas for data analysis. I need to merge two dataframes according to their indexes. However, their indexes are totally different. The rule is that if the index of df2 is the substring of df1, then I should merge them. For example, df1.index == ['a/aa/aaa','b/bb/bbb', 'c/cc/ccc'], and df2.index == ['bb/bbb', 'ccc', 'hello']. Then df1 and df2 have two indexes in common, we should merge them based on these indexes. What should i do?

tlentali · Accepted Answer

Having your DataFrame :

>>> df1 = pd.DataFrame({'col_a': [1, 2, 3]}, index=['a/aa/aaa','b/bb/bbb', 'c/cc/ccc'])
>>> df2 = pd.DataFrame({'col_b': [4, 5, 6]}, index=['bb/bbb', 'ccc', 'hello'])

And changing the index to column :

>>> df1=df1.reset_index(drop=False)
>>> df1 = df1.rename(columns={'index': 'value_df1'})
>>> df1
    value_df1   col_a
0   a/aa/aaa    1
1   b/bb/bbb    2
2   c/cc/ccc    3

>>> df2=df2.reset_index(drop=False)
>>> df2 = df2.rename(columns={'index': 'value_df2'})
>>> df2
    value_df2       col_b
0   bb/bbb          4
1   ccc             5
2   hello           6

We merge both DataFrame on the joincolumn :

>>> df1['join'] = 1
>>> df2['join'] = 1
>>> dfFull = df1.merge(df2, on='join').drop('join', axis=1)
>>> dfFull
    value_df1   col_a   value_df2       col_b
0   a/aa/aaa    1       bb/bbb          4
1   a/aa/aaa    1       ccc             5
2   a/aa/aaa    1       hello           6
3   b/bb/bbb    2       bb/bbb          4
4   b/bb/bbb    2       ccc             5
5   b/bb/bbb    2       hello           6
6   c/cc/ccc    3       bb/bbb          4
7   c/cc/ccc    3       ccc             5
8   c/cc/ccc    3       hello           6

Then we use an apply to match the initial index value :

>>> df2.drop('join', axis=1, inplace=True)
>>> dfFull['match'] = dfFull.apply(lambda x: x['value_df1'].find(x['value_df2']), axis=1).ge(0)
>>> dfFull
    value_df1   col_a   value_df2       col_b   match
0   a/aa/aaa    1       bb/bbb          4       False
1   a/aa/aaa    1       ccc             5       False
2   a/aa/aaa    1       hello           6       False
3   b/bb/bbb    2       bb/bbb          4       True
4   b/bb/bbb    2       ccc             5       False
5   b/bb/bbb    2       hello           6       False
6   c/cc/ccc    3       bb/bbb          4       False
7   c/cc/ccc    3       ccc             5       True
8   c/cc/ccc    3       hello           6       False

Filtering on the row where the column match is True and dropping the match column, we get the expected result :

>>> dfFull[dfFull['match']].drop(['match'], axis=1)
    value_df1   col_a   value_df2   col_b
3   b/bb/bbb    2       bb/bbb      4       
7   c/cc/ccc    3       ccc         5

This solution is inspired by this post.

How to merge two dataframes according to their indexes?

Answers (2)

Related Questions