Reputation: 13
I have the following data in a pandas DataFrame:
import pandas as pd
df = pd.read_csv('example_data_file.csv')
df.head()
ID Year status
223725 1991 No
223725 1992 No
223725 1993 No
223725 1994 No
223725 1995 No
I want to replace the values in the column status
, which has the values Yes
and No
for an ID based on the following condition:
If an ID
has at least one Yes
in the column status
then all observations (including No
) in the column status
specific to that ID
is replaced with Yes
. Otherwise, it remains unchanged.
For example in the DataFrame below, 844272365
has Yes
in status
in the last row, then all previous observations in status
in those rows specific to 844272365
should be replaced with Yes
.
ID Year status
844272365 1991 No
844272365 1992 No
844272365 1993 No
844272365 1994 No
844272365 1995 No
844272365 1996 No
844272365 1997 No
844272365 1998 No
844272365 1999 No
844272365 2000 No
844272365 2001 No
844272365 2002 No
844272365 2003 No
844272365 2004 No
844272365 2005 No
844272365 2006 No
844272365 2007 No
844272365 2008 No
844272365 2010 No
844272365 2011 No
844272365 2012 No
844272365 2013 Yes
How do I make these replacements for many IDs in a DataFrame in accordance with the above condition?
Upvotes: 1
Views: 92
Reputation: 323226
Check transform
with max
'Yes'>'No' # this is the reason why max work
Out[433]: True
df['new_status'] = df.groupby('ID')['status'].transform('max')
df
Out[435]:
ID Year status new_status
0 844272365 1991 No Yes
1 844272365 1992 No Yes
2 844272365 1993 No Yes
3 844272365 1994 No Yes
4 844272365 1995 No Yes
5 844272365 1996 No Yes
6 844272365 1997 No Yes
7 844272365 1998 No Yes
8 844272365 1999 No Yes
9 844272365 2000 No Yes
10 844272365 2001 No Yes
11 844272365 2002 No Yes
12 844272365 2003 No Yes
13 844272365 2004 No Yes
14 844272365 2005 No Yes
15 844272365 2006 No Yes
16 844272365 2007 No Yes
17 844272365 2008 No Yes
18 844272365 2010 No Yes
19 844272365 2011 No Yes
20 844272365 2012 No Yes
21 844272365 2013 Yes Yes
Upvotes: 1
Reputation: 21709
You can use transform
:
df['new_status'] = (df
.groupby('ID')['status']
.transform(lambda x: 'Yes' if x.str.contains('Yes').any() else 'No'))
Upvotes: 1
Reputation: 10624
The following should work:
s=set(df[df.status=='Yes']['ID'])
for i in range(len(df)):
if df.ID.iloc[i] in s:
df.status[i]='Yes'
Upvotes: 0