Marina
Marina

Reputation: 349

Python DataFrame - deleting rows with column values belonging to lists of values

I am looking for a solution to the following problem. There's a DataFrame:

data = np.array([['', 'col1', 'col2'],
                ['row1', 1, 2],
                ['row2', 3, 4]])
df = pd.DataFrame(data=data[1:,1:], index=data[1:,0],columns=data[0,1:])

I wish to retain rows in which, for example, value in column col1 belongs to a list [1, 2] while value in column col2 belongs to a list [2, 4]. This is what I thought would work

df1 = df[df['col1'].isin([1,2]) & df['col2'].isin([2,4])]

However df1 prints as an Empty DataFrame. On the other hand, this approach

df1 = df[(df.col1 in [1,2]) & (df.col2 in [2,4])]

results in

ValueError: The truth value of a Series is ambiguous. Use a.empty, `a.bool()`, `a.item()`, `a.any()` or `a.all()`.

It would be expected to get a DataFrame with row1 in it. Needless to say I am relatively new to Python. Thanks a lot for your help.

Upvotes: 3

Views: 541

Answers (2)

BENY
BENY

Reputation: 323366

Your colunm type is object , since you create the data by using np.array , np.array only allow single dtype in each array

df.applymap(type)
Out[139]: 
               col1           col2
row1  <class 'str'>  <class 'str'>
row2  <class 'str'>  <class 'str'>

Create by using this way

df = pd.DataFrame(data=[[1,2],[3,4]], index=['row1','row2'],columns=['col1','col2'])
df[(df['col1'].isin([1,2])) & (df['col2'].isin([2,4]))]
Out[143]: 
      col1  col2
row1     1     2

Upvotes: 2

jpp
jpp

Reputation: 164813

You need to convert numeric series to numeric types:

df = pd.DataFrame(data=data[1:,1:].astype(int),
                  index=data[1:,0],
                  columns=data[0,1:])

df1 = df[df['col1'].isin([1,2]) & df['col2'].isin([2,4])]

print(df1)

      col1  col2
row1     1     2

Your code does not work because your initial data array is of type object, representing pointers to arbitrary types. Pandas does not apply conversion implicitly as this would be prohibitively expensive in most situations.

If you already have a constructed Pandas dataframe, you can apply numeric conversion as a separate step:

df = df.astype(int)

Or, to convert only specified series:

cols = ['col1', 'col2']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

Upvotes: 4

Related Questions