Moose
Moose

Reputation: 11

join/merge pandas dataframes where index is substring of other index

I would like to join (or merge) 2 dataframes where the index of the second dataframe is a substring of the one of the first dataframe. I could do it by creating a temporary column that is a substring of the index and then join on this column, but I'm sure there's another more intelligent way to do it.

df1:

XX0001S1    4
XX0001S2    4.5
XX0001S3    4
XX0253S1    13
XX0254S3    5

df2

XX0001  good
XX0253  bad
XX0254  average

result should be:

XX0001S1    4   good
XX0001S2    4.5 good
XX0001S3    4   good
XX0253S1    13  bad
XX0254S3    5   average

Any ideas please?

M.

Upvotes: 1

Views: 256

Answers (1)

Bill Huang
Bill Huang

Reputation: 4648

Important: This solution may fail if regex-reserved characters were present in the short index.

Explicit creation of a new column can be avoided by using string accessor as indicated in this answer. Just map the short index found onto the corresponding text values.

Data

Names were assigned to sample data columns for convenience. Just replace them with the actual ones.

df
Out[75]:
          val1
XX0001S1   4.0
XX0001S2   4.5
XX0001S3   4.0
XX0253S1  13.0
XX0254S3   5.0

df2
Out[77]:
           text
XX0001     good
XX0253      bad
XX0254  average

Code

The solution can be put into an one-liner. I wrote it separately for clarity.

# 1. regex search pattern at string beginning
patt = "^(" + "|".join(df2.index.values) + ")"

# 2. find corresponding substrings via accessor
idx2_values = df.index.str.extract(patt, expand=False)

# 3.map index to the contents
df["text"] = df2.loc[idx2_values, "text"].values

# result
df
Out[108]: 
          val1     text
XX0001S1   4.0     good
XX0001S2   4.5     good
XX0001S3   4.0     good
XX0253S1  13.0      bad
XX0254S3   5.0  average

Upvotes: 1

Related Questions