tmo12345
tmo12345

Reputation: 11

Removing rows from one dataframe that appear in a second or third dataframe

first question on here; I'm currently working on a project that involves 3 dataframes (df,df2,df3), the first being the largest and the only one that needs to be changed. Basically, if a name appears in column "name" for df2 or column "person" in df3, the entire row needs to be dropped from df.

Also there all in dataframes and not lists, I don't know if that helps clarify.

data = [['tom', 10], ['nick', 15], ['juli', 14], ['james', 45],['ted', 15], ['roger', 14], ['steve', 45]] 

df = pd.DataFrame(data, columns = ['Name', 'Age']) 
    Name    Age
0   tom     10
1   nick    15
2   juli    14
3   james   45
4   ted     15
5   roger   14
6   steve   45
data2 = [['teddy', 150], ['nick', 135], ['joshua', 140], ['jeremy', 45],['ted', 200], ['george', 147], ['steve', 145]]

df2 = pd.DataFrame(data2, columns = ['Name', 'Weight']) 
  Name  Weight
0   teddy   150
1   nick    135
2   joshua  140
3   jeremy  45
4   ted 200
5   george  147
6   steve   145
data3 = [['234234','blue','ted'], ['483765','brown','roger'], ['958768','green','roger']]

df3 = pd.DataFrame(data3, columns = ['ID_Number','Eye Color', 'person']) 
    ID_Number   Eye Color   person
0   234234      blue        ted
1   483765      brown       roger
2   958768      green       roger

Ideally I'd like to drop from df (the first one) all of the rows containing the names repeated under df2.Name and df3.person

Essentially the end product of df would contain rows for only tom, juli, and james and would show there ages and wouldn't matter if it showed more rows that'd be okay.

Upvotes: 1

Views: 43

Answers (3)

sammywemmy
sammywemmy

Reputation: 28649

Get the unique names from df2 and df3, and exclude them from df

val = df2['Name'].unique()
val1 = df3['person'].unique()

df.query("Name not in @val1 and Name not in @val")

    Name    Age
0   tom     10
2   juli    14
3   james   45

Upvotes: 0

wwnde
wwnde

Reputation: 26676

  # Add the df3.person and df2.Name to a list and merge the list using + operator

l=df3.person.values.tolist() + df2.Name.values.tolist()
#Check membership using the .isin method and mask those members found

df[~df.Name.isin(l)]

Chained solution

df[~df.Name.isin(df3.person.values.tolist() + df2.Name.values.tolist())]
    Name    Age
0   tom     10
2   juli    14
3   james   45

Upvotes: 0

NYC Coder
NYC Coder

Reputation: 7594

Use isin for this:

df = df[~(df['Name'].isin(df2['Name'])) & ~(df['Name'].isin(df3['person']))]
print(df)

    Name  Age
0    tom   10
2   juli   14
3  james   45

Or you can use np.where:

df = df.loc[np.where(~(df['Name'].isin(df2['Name'])) & ~(df['Name'].isin(df3['person'])))]
print(df)

    Name  Age
0    tom   10
2   juli   14
3  james   45

Upvotes: 1

Related Questions