Reputation: 461
I'm Just trying to understand to get the values of One DataFrame based on the specific column ie in this case column IDs
which is present in both DataFrames, i'm looking forward to match the values based on df1
s column IDs
with df2
s column IDs
So, if the values of df1.Keywords
is in df2.Name
but df2.Name having more then one value which would be a considerable point here print the Boolean values True
or False
based on that.
Note: df1.Keywords isin df2.Name
based on the IDs
on booth the DataFrames.
DataFrame First df1
>>> df1
IDs Keywords
0 1234 APPLE
1 1234 ORANGE
2 1234 LEMONS
3 5346 ORANGE
4 5346 STRAWBERRY
5 5346 BLUEBERRY
6 8793 TEA
DataFrame Second df2
>>> df2
IDs Name
0 1234 APPLE ABCD
1 5346 APPLE ABCD
2 1234 STRAWBERRY YES
3 8793 ORANGE AVAILABLE
4 8793 TEA AVAILABLE
Expected :
IDs Name New_Bools
1234 APPLE ABCD ONE True
5346 APPLE ABCD False
1234 STRAWBERRY YES False
8793 ORANGE AVAILABLE False
8793 TEA AVAILABLE False
8793 TEA COFFEE True
I have tried creating MultiIndex with isin
but not working.
index1 = pd.MultiIndex.from_arrays([df1[col] for col in ['IDs', 'Keywords']])
index2 = pd.MultiIndex.from_arrays([df2[col] for col in ['IDs', 'Name']])
df1.IDs.isin(df2.IDs)
Upvotes: 1
Views: 98
Reputation: 7410
Use merge
to associate Keywords
with Name
via IDs
, after you groupby
IDs
and Name
, apply
lambda
to calculate if any name contains
the keywords in its group:
df.merge(df2).groupby(['IDs','Name']).apply(lambda x: any(x['Name'].str.contains('|'.join(x['Keywords'])))).rename('flag').reset_index()
IDs Name flag
0 1234 APPLE ABCD True
1 1234 STRAWBERRY YES False
2 5346 APPLE ABCD False
3 8793 ORANGE AVAILABLE False
4 8793 TEA AVAILABLE True
Upvotes: 1
Reputation: 363
Depending on what you're trying to match (which seems like only Id from your question) Assuming you don't care about matching your Name
and Keyword
columns, then using isin
is appropriate and something like the below should give you your boolean flag.
import pandas as pd
data = {'Id': [1, 2, 3, 1, 2, 3, 1, 2], 'Val': ['ABC', 'BCD', 'CDE', 'DEF', 'EFG', 'HIJ', 'IJK', 'JKL']}
data2 = {'Id': [1, 4, 7, 1, 2, 0, 1, 5], 'Val': ['ABC pld', 'BCD iod', 'CDE jkdf', 'DEF uyt', 'EFG erf', 'HIJ dfd', 'IJK mnb', 'JKL jkdf']}
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df['New_Bools'] = df['Id'].isin(df2['Id'])
print(df)
Outputs
Id Val Id_In_DF2
0 1 ABC True
1 2 BCD True
2 3 CDE False
3 1 DEF True
4 2 EFG True
5 3 HIJ False
6 1 IJK True
7 2 JKL True
If you do care about matching on the names (your expected output), you'll need to do additional string manipulation to isolate the fruits in df2. Continuing with the data in my example, you'll be able to isolate your fruits using apply
, lambda
, and split
on your strings. Index 0
will give you your fruits.
Lastly, you can then use a mask to compare the data between the two frames and output your boolean.
# string manipulation for df2
df2['Name'] = df2['Val'].apply(lambda x: x.split(' ')[0])
# perform your eval like this:
df['New_Bools_Id_And_Name'] = ((df['Id'] == df2['Id']) & (df['Val'] == df2['Name']))
Which outputs
Id Val New_Bools New_Bools_Id_And_Name
0 1 ABC True True
1 2 BCD True False
2 3 CDE False False
3 1 DEF True True
4 2 EFG True True
5 3 HIJ False False
6 1 IJK True True
7 2 JKL True False
Upvotes: 1