Suhas_mudam
Suhas_mudam

Reputation: 195

To get the True row if I have both True & False rows for the same Item using pandas

I have a dataframe with this data.

import pandas as pd

data = {'Item':['2', '1', '2'],
    'IsAvailable':['True', 'False', 'False']}
df = pd.DataFrame(data)
================================

Item  |  IsAvailable
---------------------
  2   |     True
  1   |     False
  2   |     False

In the dataframe, I have data like above shown. As you can see I have both True as well as False for Item 2. In that case I want to have a single record with just True.

Expected output:

Item  |  IsAvailable
---------------------
  2   |     True
  1   |     False

Please help in writing the condition for this using python pandas.

Thanks

Upvotes: 0

Views: 687

Answers (4)

Nanna
Nanna

Reputation: 575

Here is a solution where we check if the value True is one of the values assigned to each item. If so, the outcome is also True.

>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x))
Item
1    False
2     True
Name: IsAvailable, dtype: bool

If you want to keep the column name, use

>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x)).reset_index()
  Item  IsAvailable
0    1        False
1    2         True

Upvotes: 0

Oleg O
Oleg O

Reputation: 1055

Since bool is also kind of int:

df = df.sort_values('IsAvailable').drop_duplicates(subset=['Item'], keep='last')

This will reorder your items though, of course. Funny thing: it works even when you have True/False strings.

Upvotes: 1

braml1
braml1

Reputation: 584

If you just want the first occurence: Edit: as per @jezrael, you may want to map your strings to booleans first

df['IsAvailable'] = df['IsAvailable'].replace({'True':True, 'False':False})
dfOut = df.drop_duplicates(subset="Item", keep='first')
print(dfOut)

  Item IsAvailable
0    2        True
1    1       False

Upvotes: 0

jezrael
jezrael

Reputation: 862611

I think you need first replace strings True and False to boolean if necessary and then get first row with True per groups by DataFrameGroupBy.idxmax for indices and selecting by DataFrame.loc:

df['IsAvailable'] = df['IsAvailable'].map({'True':True, 'False':False})

df = df.loc[df.groupby('Item', sort=False)['IsAvailable'].idxmax()]
print (df)
  Item  IsAvailable
0    2         True
1    1        False

Upvotes: 0

Related Questions