UserYmY
UserYmY

Reputation: 8554

Python Pandas: How to use 'where' and 'isin' with two conditions?

I have a dataframe 'dfm' :

      match_x            org_o           group   match_y
0       012  012 Smile Communications     92      012
1       012                 012 Smile     92      000
2   10types                   10TYPES     93      10types
3   10types               10types.com     97      10types
4  360works                  360WORKS     94      360works
5  360works              360works.com     94      360works

I would like a column to 'a' called 'tag'. for each org in dfm, when match_x and match_y is equal and they have one unique group the tag would be 'TP' else it is 'FN'.Here is the code I have used :

a['tag'] = np.where(((a['match_x'] == a['match_y']) & (a.groupby(['group', 'match_x','match_y'])['group'].count() == 1)),'TP', 'FN')

but I am receiving this error:

TypeError: 'DataFrameGroupBy' object is not callable

Does anybody know how to do it?

Upvotes: 0

Views: 1357

Answers (1)

firelynx
firelynx

Reputation: 32224

Lets break down your huge statement a bit:

a['tag'] = np.where(((a['match_x'] == a['match_y']) & (a.groupby(['group', 'match_x','match_y'])['group'].count() == 1)),'TP', 'FN')

Lifting out the mask:

mask = ((a['match_x'] == a['match_y']) & (a.groupby(['group', 'match_x','match_y'])['group'].count() == 1))
a['tag'] = np.where(mask,'TP', 'FN')

Breaking down the mask:

mask_x_y_equal = a['match_x'] == a['match_y']
single_line = a.groupby(['group', 'match_x','match_y']).size() == 1
mask = (mask_x_y_equal & single_line)
a['tag'] = np.where(mask,'TP', 'FN')

If you would do this, the error will be more obvious. The single_line mask will not be the same length as the mask_x_y_equal. This becomes a problem, because the & sign does not care about the index of the series, which means that you currently have a silent error here.

We can remove this silent error by operating inside a dataframe:

df_grouped = a.groupby(['group', 'match_x','match_y']).size() # size does what you do with the ['group'].count(), but a bit more clean.
df_grouped.reset_index(inplace=True) # This makes df_grouped into a dataframe by putting the index back into it.
df_grouped['equal'] = df_grouped['match_x'] == df_grouped['match_y'] # The mask will now be a part of the dataframe

mask = (df_grouped['equal'] & (df_grouped['0'] == 1)) # Now we create your composite mask with comparable indicies
a['tag'] = np.where(mask, 'TP', 'FN')

This may or may not solve your "TypeError: 'DataFrameGroupBy' object is not callable". Either way, breaking down your statement into multiple lines will show you more what the error may be.

Upvotes: 2

Related Questions