krock1516
krock1516

Reputation: 461

How to compare the values of pandas columns between two DataFrames

I'm Just trying to understand to get the values of One DataFrame based on the specific column ie in this case column IDs which is present in both DataFrames, i'm looking forward to match the values based on df1s column IDs with df2s column IDs So, if the values of df1.Keywords is in df2.Name but df2.Name having more then one value which would be a considerable point here print the Boolean values True or False based on that.

Note: df1.Keywords isin df2.Name based on the IDs on booth the DataFrames.

DataFrame First df1

>>> df1
    IDs    Keywords
0  1234       APPLE
1  1234      ORANGE
2  1234      LEMONS
3  5346      ORANGE
4  5346  STRAWBERRY
5  5346   BLUEBERRY
6  8793         TEA

DataFrame Second df2

>>> df2
    IDs              Name
0  1234        APPLE ABCD
1  5346        APPLE ABCD
2  1234    STRAWBERRY YES
3  8793  ORANGE AVAILABLE
4  8793     TEA AVAILABLE

Expected :

IDs    Name               New_Bools
1234   APPLE ABCD ONE     True
5346   APPLE ABCD         False
1234   STRAWBERRY YES     False
8793   ORANGE AVAILABLE   False
8793   TEA AVAILABLE      False
8793   TEA COFFEE         True

I have tried creating MultiIndex with isin but not working.

index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['IDs', 'Keywords']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['IDs', 'Name']])

df1.IDs.isin(df2.IDs)

Upvotes: 1

Views: 98

Answers (2)

Franco Piccolo
Franco Piccolo

Reputation: 7410

Use merge to associate Keywords with Name via IDs, after you groupby IDs and Name, apply lambda to calculate if any name contains the keywords in its group:

df.merge(df2).groupby(['IDs','Name']).apply(lambda x: any(x['Name'].str.contains('|'.join(x['Keywords'])))).rename('flag').reset_index()

    IDs     Name              flag
0   1234    APPLE ABCD        True
1   1234    STRAWBERRY YES    False
2   5346    APPLE ABCD        False
3   8793    ORANGE AVAILABLE  False
4   8793    TEA AVAILABLE     True

Upvotes: 1

rs311
rs311

Reputation: 363

If you need to create a boolean based on one value only...

Depending on what you're trying to match (which seems like only Id from your question) Assuming you don't care about matching your Name and Keyword columns, then using isin is appropriate and something like the below should give you your boolean flag.

import pandas as pd

data = {'Id': [1, 2, 3, 1, 2, 3, 1, 2], 'Val': ['ABC', 'BCD', 'CDE', 'DEF', 'EFG', 'HIJ', 'IJK', 'JKL']}
data2 = {'Id': [1, 4, 7, 1, 2, 0, 1, 5], 'Val': ['ABC pld', 'BCD iod', 'CDE jkdf', 'DEF uyt', 'EFG erf', 'HIJ dfd', 'IJK mnb', 'JKL jkdf']}


df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)

df['New_Bools'] = df['Id'].isin(df2['Id'])
print(df)

Outputs

   Id  Val       Id_In_DF2
0   1  ABC       True
1   2  BCD       True
2   3  CDE      False
3   1  DEF       True
4   2  EFG       True
5   3  HIJ      False
6   1  IJK       True
7   2  JKL       True

If you need to match on multiple columns and values...

If you do care about matching on the names (your expected output), you'll need to do additional string manipulation to isolate the fruits in df2. Continuing with the data in my example, you'll be able to isolate your fruits using apply, lambda, and split on your strings. Index 0 will give you your fruits.

Lastly, you can then use a mask to compare the data between the two frames and output your boolean.

# string manipulation for df2
df2['Name'] = df2['Val'].apply(lambda x: x.split(' ')[0])

# perform your eval like this:
df['New_Bools_Id_And_Name'] = ((df['Id'] == df2['Id']) & (df['Val'] == df2['Name']))

Which outputs

    Id  Val  New_Bools  New_Bools_Id_And_Name
0   1  ABC       True                   True
1   2  BCD       True                  False
2   3  CDE      False                  False
3   1  DEF       True                   True
4   2  EFG       True                   True
5   3  HIJ      False                  False
6   1  IJK       True                   True
7   2  JKL       True                  False

Upvotes: 1

Related Questions