Remove random N number of rows based on conditions on multiple columns in pandas

Question

df

    Text column  Title     Numbers column
0          abc   rom-com               1
1          xyz    comedy               2
2           hi   rom-com               4
3          jkl    murder               5
4          abc  thriller               2
and so on................

What I want:

I want to remove 5 random rows where column Title has value rom-com and remove random 6 rows of column where title column has value 'murder'.

Code:

df1 = df.drop(df[df['Title'].str.contains('rom-com')].sample(5).index & /
[df['Title'].str.contains('murder')].sample(6).index)

Error:

AttributeError: 'list' object has no attribute 'sample'

Above code is working well for one title but not both together.

df1 = df.drop(df[df['Title'].str.contains('rom-com')].sample(5).index \
#this alone works for both murder and rom-com separately.

But both together I am not able to remove rows corresponding to values in multiple columns.

jezrael · Accepted Answer

It is possible with Index.union:

df1 = df.drop(df[df['Title'].str.contains('rom-com')].sample(5).index.union(df[df['Title'].str.contains('murder')].sample(6).index))

Remove random N number of rows based on conditions on multiple columns in pandas

Answers (1)

Related Questions