nfalesit
nfalesit

Reputation: 115

Removing values from column within groups based on conditions

I am really struggling with this even though I feel like it should be extremely easy.

I have a dataframe that looks like this:

Title Release Date Released In Stores
Seinfeld 1995
Seinfeld 1999 Yes
Seinfeld 1999 Yes
Friends 2000 Yes
Friends 2004 Yes
Friends 2004

I am first grouping by Title, and then Release Date and then observing the values of Released and In Stores. If both Released and In Stores have a value of "Yes" in the same Release Date year, then remove the In Stores value.

So in the above dataframe, the category Seinfeld --> 1999 would have the "Yes" removed from In Stores, but the "Yes" would stay in the In Stores category for "2004" since it is the only "Yes" in the Friends --> 2004 category.

I am starting by using

df.groupby(['Title', 'Release Date'])['Released', 'In Stores].count()

But I cannot figure out the syntax of removing values from In_Stores.

Desired output:

Title Release Date Released In Stores
Seinfeld 1995
Seinfeld 1999 Yes
Seinfeld 1999
Friends 2000 Yes
Friends 2004 Yes
Friends 2004

EDIT: I have tried this line given in the top comment:

flag = (df.groupby(['Title', 'Release Date']).transform(lambda x: (x == 'Yes').any()) .all(axis=1))

but the kernel runs indefinitely.

Upvotes: 1

Views: 308

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

You can use groupby.transform to flag rows where In Stores needs to be removed, based on whether the row's ['Title', 'Release Date'] group has at least one value of 'Yes' in column Released, and also in column In Stores.

flag = (df.groupby(['Title', 'Release Date'])
          .transform(lambda x: (x == 'Yes').any())
          .all(axis=1))

print(flag)
0    False
1     True
2     True
3    False
4    False
5    False
dtype: bool

df.loc[flag, 'In Stores'] = np.nan

Result:

Title Release Date Released In Stores
Seinfeld 1995 nan nan
Seinfeld 1999 Yes nan
Seinfeld 1999 nan nan
Friends 2000 Yes nan
Friends 2004 nan Yes
Friends 2004 nan nan

Upvotes: 1

Related Questions