Reputation: 40
I'm new to using pandas. I'm trying to search for a substring in one dataframe using a string from a different dataframe.
Then, I want to merge those two dataframes based upon this match. When merging, for the rows in one dataframe that don't match up with a row in the other dataframe, I want those rows to display Null values.
Basically, I want to take the data from the parcel column and use it to search the the strings within the parcel_id column of the other dataframe. If a match is found, I want to merge these 2 rows into 1 row. For those in the first dataframe that don't have a match in the other dataframe, I want Null values used in those rows when merging. How do I do this?
Upvotes: 1
Views: 950
Reputation: 1841
As you are new, here are some hints for asking such a question:
As pointed out in the comments, you should add a minimal example such as below, that everybode can import, look at and play around with.
Also you should provide the ways you already tried to do so.
First of, wildcard merges to my knowledge are not possible, therefore data cleaning is required. I did so with the sales['parcel_id_new']
column that is just the ìd
part of the string.
Then you use a left join merge
, as it takes all entries from the first left
table and matches all from right
leaving NAN
in all others
parcels = pd.DataFrame([
{'parcel': '01 01234566789AB', 'other column': 'Package A'},
{'parcel': '02 01234566789AB', 'other column': 'Package B'},
])
print(parcels)
sales = pd.DataFrame([
{'parcel_id': '01 01234566789AB OTHER UNIMORTANT TEXT', 'other sales column': 'Sales A'},
])
sales['parcel_id_new'] = sales['parcel_id'].apply(lambda x: x[:16])
print(sales)
full_table = pd.merge(how='left',
left= parcels,
right=sales,
left_on='parcel', right_on='parcel_id_new')
print(full_table)
Output:
PARCELS:
parcel other column
0 01 01234566789AB Package A
1 02 01234566789AB Package B
SALES:
parcel_id other sales column parcel_id_new
0 01 01234566789AB OTHER UNIMORTANT TEXT Sales A 01 01234566789AB
RESULT:
parcel other column parcel_id other sales column parcel_id_new
0 01 01234566789AB Package A 01 01234566789AB OTHER UNIMORTANT TEXT Sales A 01 01234566789AB
1 02 01234566789AB Package B NaN NaN NaN
Upvotes: 1