NoSoyTuMadre
NoSoyTuMadre

Reputation: 40

Pandas left join with wildcard string match

I'm new to using pandas. I'm trying to search for a substring in one dataframe using a string from a different dataframe.

screenshot of dataframes

Then, I want to merge those two dataframes based upon this match. When merging, for the rows in one dataframe that don't match up with a row in the other dataframe, I want those rows to display Null values.

Basically, I want to take the data from the parcel column and use it to search the the strings within the parcel_id column of the other dataframe. If a match is found, I want to merge these 2 rows into 1 row. For those in the first dataframe that don't have a match in the other dataframe, I want Null values used in those rows when merging. How do I do this?

Upvotes: 1

Views: 950

Answers (1)

FloLie
FloLie

Reputation: 1841

As you are new, here are some hints for asking such a question:

As pointed out in the comments, you should add a minimal example such as below, that everybode can import, look at and play around with.

Also you should provide the ways you already tried to do so.

First of, wildcard merges to my knowledge are not possible, therefore data cleaning is required. I did so with the sales['parcel_id_new'] column that is just the ìd part of the string.

Then you use a left join merge, as it takes all entries from the first left table and matches all from right leaving NAN in all others

parcels = pd.DataFrame([
    {'parcel': '01 01234566789AB', 'other column': 'Package A'},
    {'parcel': '02 01234566789AB', 'other column': 'Package B'},
    ])

print(parcels)

sales = pd.DataFrame([
    {'parcel_id': '01 01234566789AB OTHER UNIMORTANT TEXT', 'other sales column': 'Sales A'},
    ])

sales['parcel_id_new'] = sales['parcel_id'].apply(lambda x: x[:16]) 
print(sales)

full_table = pd.merge(how='left',
         left= parcels, 
         right=sales, 
         left_on='parcel', right_on='parcel_id_new')
print(full_table)

Output:

PARCELS:
   parcel              other column
0  01 01234566789AB    Package A
1  02 01234566789AB    Package B

SALES:
   parcel_id other                         sales column     parcel_id_new
0  01 01234566789AB OTHER UNIMORTANT TEXT  Sales A          01 01234566789AB

RESULT:
             parcel    other column  parcel_id                               other sales column     parcel_id_new  
0  01 01234566789AB    Package A     01 01234566789AB OTHER UNIMORTANT TEXT  Sales A                01 01234566789AB 
1  02 01234566789AB    Package B     NaN                                     NaN                    NaN 

Upvotes: 1

Related Questions