Reputation: 11
first question on here; I'm currently working on a project that involves 3 dataframes (df,df2,df3), the first being the largest and the only one that needs to be changed. Basically, if a name appears in column "name" for df2 or column "person" in df3, the entire row needs to be dropped from df.
Also there all in dataframes and not lists, I don't know if that helps clarify.
data = [['tom', 10], ['nick', 15], ['juli', 14], ['james', 45],['ted', 15], ['roger', 14], ['steve', 45]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Name Age
0 tom 10
1 nick 15
2 juli 14
3 james 45
4 ted 15
5 roger 14
6 steve 45
data2 = [['teddy', 150], ['nick', 135], ['joshua', 140], ['jeremy', 45],['ted', 200], ['george', 147], ['steve', 145]]
df2 = pd.DataFrame(data2, columns = ['Name', 'Weight'])
Name Weight
0 teddy 150
1 nick 135
2 joshua 140
3 jeremy 45
4 ted 200
5 george 147
6 steve 145
data3 = [['234234','blue','ted'], ['483765','brown','roger'], ['958768','green','roger']]
df3 = pd.DataFrame(data3, columns = ['ID_Number','Eye Color', 'person'])
ID_Number Eye Color person
0 234234 blue ted
1 483765 brown roger
2 958768 green roger
Ideally I'd like to drop from df (the first one) all of the rows containing the names repeated under df2.Name
and df3.person
Essentially the end product of df would contain rows for only tom, juli, and james and would show there ages and wouldn't matter if it showed more rows that'd be okay.
Upvotes: 1
Views: 43
Reputation: 28649
Get the unique names from df2
and df3
, and exclude them from df
val = df2['Name'].unique()
val1 = df3['person'].unique()
df.query("Name not in @val1 and Name not in @val")
Name Age
0 tom 10
2 juli 14
3 james 45
Upvotes: 0
Reputation: 26676
# Add the df3.person and df2.Name to a list and merge the list using + operator
l=df3.person.values.tolist() + df2.Name.values.tolist()
#Check membership using the .isin method and mask those members found
df[~df.Name.isin(l)]
Chained solution
df[~df.Name.isin(df3.person.values.tolist() + df2.Name.values.tolist())]
Name Age
0 tom 10
2 juli 14
3 james 45
Upvotes: 0
Reputation: 7594
Use isin
for this:
df = df[~(df['Name'].isin(df2['Name'])) & ~(df['Name'].isin(df3['person']))]
print(df)
Name Age
0 tom 10
2 juli 14
3 james 45
Or you can use np.where
:
df = df.loc[np.where(~(df['Name'].isin(df2['Name'])) & ~(df['Name'].isin(df3['person'])))]
print(df)
Name Age
0 tom 10
2 juli 14
3 james 45
Upvotes: 1