Reputation: 884
I need to filter rows from a dataframe that include matching pairs of strings. For example if the below instance when filtered only data for IDs 1 and 2 would be kept as 3 does not have a corresponding '3 Month' for the '0 Month' entry:
df = pd.DataFrame({'ID':[1,2,3,1,2,1], 'Period':['0 Month','0 Month','0 Month','3 Month','3 Month','6 Month']})
The OR operation can easily be used to filter for 2 strings, as below, but that does not drop the ID without the requisite pair.
df = df[(df["Period"].str.contains("0 Month")) | (df["Period"].str.contains("3 Month"))]
df
Therefore I'm attempting to use the AND operator to address this need but that is returning an empty dataframe:
df = df[(df["Period"].str.contains("0 Month")) & (df["Period"].str.contains("3 Month"))]
df
Upvotes: 0
Views: 205
Reputation:
You can groupby
"ID" and the condition and transform nunique
method to count the number of unique "Period"s and filter the rows with more than 1 unique "Period" values:
out = df[df.groupby(['ID', (df["Period"].str.contains("0 Month") | df["Period"].str.contains("3 Month"))])['Period'].transform('nunique') > 1]
Note that, instead of |
you can use isin
:
out = df[df.groupby(['ID', df["Period"].isin(['0 Month', '3 Month'])])['Period'].transform('nunique') > 1]
or combine the strings to match inside str.contains
:
out = df[df.groupby(['ID', df["Period"].str.contains('0|3')])['Period'].transform('nunique') > 1]
Output:
ID Period
0 1 0 Month
1 2 0 Month
3 1 3 Month
4 2 3 Month
Upvotes: 1