Reputation: 1193
This is my dataframe:
d = {'id':['a','a','a','b','b','b','c','c','c','d','d','d'],
'seg':[1,2,3,1,2,3,1,2,3,1,2,3],
'type':['y','y','y','z','z','z','y1','y1','z','z','z','y2']
}
df = pd.DataFrame(d)
I am trying to filter id based on the following conditions:
z
y
y
and z
For first and second condition I can do the following:
df_y = df.loc[df['type'].str.contains('y')]
df_z = df.loc[df['type'] == 'z']
I can then take unique
for id column.
How can I check if both y
and z
exist in type
column for the same id?
Outputs:
df_y
id
a
c
d
df_z
id
b
Required outputs:
df_y
id
a
df_z
id
b
df_y_z
id
c
d
Upvotes: 1
Views: 721
Reputation: 153500
Updated:
df[df.groupby('id')['type']\
.transform(lambda x: x[x.str.contains('y|z')].nunique()) == 2]
Output:
id seg type
6 c 1 y1
7 c 2 y1
8 c 3 z
9 d 1 z
10 d 2 z
11 d 3 y2
Note: you can use contains or startwith if you just want those types that start with y or z.
IIUC, to get df_y_z, first filter your dataframe down to only rows with y or z if you have more types, then groupby id, and count the number of unique types if equal to 2 then you have groups with both y and z:
df.loc[df.query('type in ["y","z"]').groupby('id')['type'].transform('nunique') == 2]
Output:
id seg type
6 c 1 y
7 c 2 y
8 c 3 z
9 d 1 z
10 d 2 z
11 d 3 y
Upvotes: 1
Reputation: 323316
We can do crosstab
+ dot
s=pd.crosstab(df.id,df.type).ne(0)
s.dot(s.columns)
Out[277]:
id
a y
b z
c yz
d yz
dtype: object
Upvotes: 2