Reputation: 11
I would like to join (or merge) 2 dataframes where the index of the second dataframe is a substring of the one of the first dataframe. I could do it by creating a temporary column that is a substring of the index and then join on this column, but I'm sure there's another more intelligent way to do it.
df1:
XX0001S1 4
XX0001S2 4.5
XX0001S3 4
XX0253S1 13
XX0254S3 5
df2
XX0001 good
XX0253 bad
XX0254 average
result should be:
XX0001S1 4 good
XX0001S2 4.5 good
XX0001S3 4 good
XX0253S1 13 bad
XX0254S3 5 average
Any ideas please?
M.
Upvotes: 1
Views: 256
Reputation: 4648
Important: This solution may fail if regex-reserved characters were present in the short index.
Explicit creation of a new column can be avoided by using string accessor as indicated in this answer. Just map the short index found onto the corresponding text values.
Data
Names were assigned to sample data columns for convenience. Just replace them with the actual ones.
df
Out[75]:
val1
XX0001S1 4.0
XX0001S2 4.5
XX0001S3 4.0
XX0253S1 13.0
XX0254S3 5.0
df2
Out[77]:
text
XX0001 good
XX0253 bad
XX0254 average
Code
The solution can be put into an one-liner. I wrote it separately for clarity.
# 1. regex search pattern at string beginning
patt = "^(" + "|".join(df2.index.values) + ")"
# 2. find corresponding substrings via accessor
idx2_values = df.index.str.extract(patt, expand=False)
# 3.map index to the contents
df["text"] = df2.loc[idx2_values, "text"].values
# result
df
Out[108]:
val1 text
XX0001S1 4.0 good
XX0001S2 4.5 good
XX0001S3 4.0 good
XX0253S1 13.0 bad
XX0254S3 5.0 average
Upvotes: 1