pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

Question

I have the following df,

id    invoice_no
1     6636
1     6637
2     6639
2     6639
3     
3    
4     6635
4     6635
4     6635

the invoice_no for id 3 are all empty strings or spaces; I want to

df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1

but also consider spaces and empty string invoice_no in each group as same_invoice_no = False; I am wondering how to do that. The result will look like,

id    invoice_no    same_invoice_no
1     6636          False
1     6637          False
2     6639          True
2     6639          True
3                   False
3                   False
4     6635          True
4     6635          True
4     6635          True

Vaishali · Accepted Answer

Empty strings equate to True but NaNs don't. Replace empty strings by Numpy nan

df.replace('', np.nan, inplace = True)
df['same_invoice_no'] = df.groupby("id")["invoice_no"].transform('nunique') == 1

    id  invoice_no  same_invoice_no
0   1   6636.0      False
1   1   6637.0      False
2   2   6639.0      True
3   2   6639.0      True
4   3   NaN         False
5   3   NaN         False
6   4   6635.0      True
7   4   6635.0      True
8   4   6635.0      True

pandas dataframe groups check the number of unique values of a column is one but exclude empty strings

Answers (1)

Related Questions