For an id in multiple rows find if values exists in pandas

Question

This is my dataframe:

d = {'id':['a','a','a','b','b','b','c','c','c','d','d','d'],
 'seg':[1,2,3,1,2,3,1,2,3,1,2,3],
 'type':['y','y','y','z','z','z','y1','y1','z','z','z','y2']
}
df = pd.DataFrame(d)

I am trying to filter id based on the following conditions:

only z
only y
both y and z

For first and second condition I can do the following:

df_y = df.loc[df['type'].str.contains('y')]
df_z = df.loc[df['type'] == 'z']

I can then take unique for id column.

How can I check if both y and z exist in type column for the same id?

Outputs:

df_y 

id
a
c
d

df_z 

id
b

Required outputs:

df_y

id
a

df_z

id
b

df_y_z

id
c
d

Scott Boston · Accepted Answer

Updated:

df[df.groupby('id')['type']\
     .transform(lambda x: x[x.str.contains('y|z')].nunique()) == 2]

Output:

   id  seg type
6   c    1   y1
7   c    2   y1
8   c    3    z
9   d    1    z
10  d    2    z
11  d    3   y2

Note: you can use contains or startwith if you just want those types that start with y or z.

IIUC, to get df_y_z, first filter your dataframe down to only rows with y or z if you have more types, then groupby id, and count the number of unique types if equal to 2 then you have groups with both y and z:

df.loc[df.query('type in ["y","z"]').groupby('id')['type'].transform('nunique') == 2]

Output:

   id  seg type
6   c    1    y
7   c    2    y
8   c    3    z
9   d    1    z
10  d    2    z
11  d    3    y

For an id in multiple rows find if values exists in pandas

Answers (2)

Related Questions