Reputation: 18953
I've been trying to figure this out for like 2 days. In my example below, do you see how the Ad state
is BOTH paused
and enabled
for AD ID
13756121050
in the top two rows? Well, in my case, each Ad ID
can only have either paused
or enabled
throughout the dataframe.
I'm trying to figure out how to make new DataFrame containing the conflicting rows where the SAME Ad ID
has both paused
and enabled
Ad state
.
The result would be the top two rows you see below because they share the same Ad ID and the have conflicting Ad states (paused, and enabled which is impossible for my purposes).
import pandas as pd
d = {'Ad state': {1092: 'enabled', 216: 'enabled', 1124: 'enabled', 213: 'paused', 1094: 'enabled', 1128: 'enabled', 1129: 'paused', 1131: 'paused', 236: 'enabled', 238: 'enabled'}, 'Based on': {1092: 'CTR', 216: 'Conv. rate', 1124: 'Conv. rate', 213: 'CTR', 1094: 'Conv. rate', 1128: 'CTR', 1129: 'CTR', 1131: 'Conv. rate', 236: 'CTR', 238: 'Conv. rate'}, 'Ad ID': {1092: 13756190410, 216: 13756121050, 1124: 13756111570, 213: 13756121050, 1094: 13756190410, 1128: 13756111570, 1129: 13756111690, 1131: 13756111690, 236: 13756120810, 238: 13756120810}}
df = pd.DataFrame(d, columns=['Ad ID', 'Ad state', 'Based on'])
#df prints this out
Ad ID Ad state Based on
213 13756121050 paused Conv. rate
216 13756121050 enabled CTR
236 13756120810 enabled CTR
238 13756120810 enabled Conv. rate
1092 13756190410 enabled CTR
1094 13756190410 enabled Conv. rate
1124 13756111570 enabled Conv. rate
1128 13756111570 enabled CTR
1129 13756111690 paused CTR
1131 13756111690 paused Conv. rate
pd.pivot_table(df, index='Ad ID', columns='Ad state', aggfunc='count', fill_value='')
#See how 13756121050 has BOTH enabled and paused status?
Based on
Ad state enabled paused
Ad ID
13756111570 2
13756111690 2
13756120810 2
13756121050 1 1
13756190410 2
Upvotes: 0
Views: 162
Reputation: 109666
If you want to have a field in your original database to indicated whether or not the data is consistent, you can use transform
.
df['consistent'] = df.groupby(['Ad_ID'])['Ad_state'].transform(lambda x: x.nunique() <= 1)
>>> df
Ad_ID Ad_state Based_on consistent
213 13756121050 paused Conv. rate False
216 13756121050 enabled CTR False
236 13756120810 enabled CTR True
238 13756120810 enabled Conv. rate True
1092 13756190410 enabled CTR True
1094 13756190410 enabled Conv. rate True
1124 13756111570 enabled Conv. rate True
1128 13756111570 enabled CTR True
1129 13756111690 paused CTR True
1131 13756111690 paused Conv. rate True
Upvotes: 1
Reputation: 24752
You can use .groupby
with .filter
, where the filter rule applies to each subgroup.
df.groupby('Ad ID').filter(lambda g: len(g['Ad state'].unique()) > 1)
Ad ID Ad state Based on
213 13756121050 paused CTR
216 13756121050 enabled Conv. rate
Upvotes: 4