Reputation: 195
I have a dataframe with this data.
import pandas as pd
data = {'Item':['2', '1', '2'],
'IsAvailable':['True', 'False', 'False']}
df = pd.DataFrame(data)
================================
Item | IsAvailable
---------------------
2 | True
1 | False
2 | False
In the dataframe, I have data like above shown. As you can see I have both True as well as False for Item 2. In that case I want to have a single record with just True.
Expected output:
Item | IsAvailable
---------------------
2 | True
1 | False
Please help in writing the condition for this using python pandas.
Thanks
Upvotes: 0
Views: 687
Reputation: 575
Here is a solution where we check if the value True is one of the values assigned to each item. If so, the outcome is also True.
>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x))
Item
1 False
2 True
Name: IsAvailable, dtype: bool
If you want to keep the column name, use
>>> df.groupby(['Item'])['IsAvailable'].apply(lambda x: 'True' in set(x)).reset_index()
Item IsAvailable
0 1 False
1 2 True
Upvotes: 0
Reputation: 1055
Since bool is also kind of int:
df = df.sort_values('IsAvailable').drop_duplicates(subset=['Item'], keep='last')
This will reorder your items though, of course. Funny thing: it works even when you have True/False strings.
Upvotes: 1
Reputation: 584
If you just want the first occurence: Edit: as per @jezrael, you may want to map your strings to booleans first
df['IsAvailable'] = df['IsAvailable'].replace({'True':True, 'False':False})
dfOut = df.drop_duplicates(subset="Item", keep='first')
print(dfOut)
Item IsAvailable
0 2 True
1 1 False
Upvotes: 0
Reputation: 862611
I think you need first replace strings True
and False
to boolean if necessary and then get first row with True
per groups by DataFrameGroupBy.idxmax
for indices and selecting by DataFrame.loc
:
df['IsAvailable'] = df['IsAvailable'].map({'True':True, 'False':False})
df = df.loc[df.groupby('Item', sort=False)['IsAvailable'].idxmax()]
print (df)
Item IsAvailable
0 2 True
1 1 False
Upvotes: 0