Creating dataframe based on condition of two columns

Question

Hi I have the below table:

name: col1: ... col10
 a     bill      abc
 b     bill      hfb
 c     aaa       asd
 a     jim       fsa
 b     aaa       gds
 a     jim       ags

I am trying to create a new table that only shows rows where 'bill' from df['col1'] hasn't shown in df['name']. So for example, bill appears in the first 2 rows and the rows with the same values in the name column as ones that correspond with occurrences with bill are removed from the df. An example output is shown below.

name: col1: ... col10
 c     aaa       asd

I have tried this:

new_df = df[df['col1']!='bill']

But this returns:

name: col1: ... col10
 c     aaa       asd
 a     jim       fsa
 b     aaa       gds
 a     jim       ags

Andrej Kesely · Accepted Answer

Try:

x = df[~df["name"].isin(df.loc[df.col1.eq("bill"), "name"])]
print(x)

Prints:

  name col1 col10
2    c  aaa   asd

STEPS:

Select all "Names" where col1=="bill":

df.loc[df.col1.eq("bill"), "name"]

0    a
1    b
Name: name, dtype: object

Create boolean mask selecting elements sharing "name" with "bills"

df["name"].isin(df.loc[df.col1.eq("bill"), "name"])

0     True
1     True
2    False
3     True
4     True
5     True
Name: name, dtype: bool

Select elements that don't share the name with "bill" with negation of this mask:

df[~df["name"].isin(df.loc[df.col1.eq("bill"), "name"])]

  name col1 col10
2    c  aaa   asd

Creating dataframe based on condition of two columns

Answers (1)

Related Questions