bigmacattack
bigmacattack

Reputation: 35

Creating dataframe based on condition of two columns

Hi I have the below table:

name: col1: ... col10
 a     bill      abc
 b     bill      hfb
 c     aaa       asd
 a     jim       fsa
 b     aaa       gds
 a     jim       ags

I am trying to create a new table that only shows rows where 'bill' from df['col1'] hasn't shown in df['name']. So for example, bill appears in the first 2 rows and the rows with the same values in the name column as ones that correspond with occurrences with bill are removed from the df. An example output is shown below.

name: col1: ... col10
 c     aaa       asd

I have tried this:

new_df = df[df['col1']!='bill']

But this returns:

name: col1: ... col10
 c     aaa       asd
 a     jim       fsa
 b     aaa       gds
 a     jim       ags

Upvotes: 1

Views: 128

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

x = df[~df["name"].isin(df.loc[df.col1.eq("bill"), "name"])]
print(x)

Prints:

  name col1 col10
2    c  aaa   asd

STEPS:

  1. Select all "Names" where col1=="bill":
df.loc[df.col1.eq("bill"), "name"]

0    a
1    b
Name: name, dtype: object
  1. Create boolean mask selecting elements sharing "name" with "bills"
df["name"].isin(df.loc[df.col1.eq("bill"), "name"])

0     True
1     True
2    False
3     True
4     True
5     True
Name: name, dtype: bool
  1. Select elements that don't share the name with "bill" with negation of this mask:
df[~df["name"].isin(df.loc[df.col1.eq("bill"), "name"])]

  name col1 col10
2    c  aaa   asd

Upvotes: 1

Related Questions