Baobab1988
Baobab1988

Reputation: 715

How to select all rows for specific ID only if another column contains specific value

I have a CSV file that contains hundreds of rows with repeatable IDs. Would there be a way to easily select rows for each ID but only if column customerCount has all the values for corresponding ID = 0?

My CSV file structure:

  report_date     id    customerCount    orderNr
  2020-02-20    123        12              10
  2020-02-19    123        18              11
  2020-02-18    123        0               12
  2020-02-20    321        0               0
  2020-02-19    321        0               0
  2020-02-18    321        0               0
  2020-02-20    456        17              13
  2020-02-19    456        0               0
  2020-02-18    456        15              14
  2020-02-20    654        0               0
  2020-02-19    654        0               0
  2020-02-18    654        0               0
  and so on...

Desired output CSV:

id    customerCount
321         0
654         0

My code so far (throws TypeError: 'method' object is not subscriptable):

import pandas as pd

df = pd.read_csv('path/to/my/file.csv')
df1 = df.loc[(df.groupby['id'](['customerCount'] == 0)]
df1.to_csv('/path/to/my.output.csv')

Thanks in advance!

Upvotes: 1

Views: 2087

Answers (1)

jezrael
jezrael

Reputation: 863146

First idea is use DataFrame.all with mask, then filter mask by itself and convert to DataFrame:

s = (df['customerCount'] == 0).groupby(df['id']).all()

df = s[s].reset_index()
df['customerCount'] = 0
print (df)
    id  customerCount
0  321              0
1  654              0

Or use Series.isin with inverted mask by ~ all id which no 0 and create DataFrame by constructor:

ids = df.loc[~df['id'].isin(df.loc[df['customerCount'] != 0, 'id']), 'id'].unique()

df = pd.DataFrame({'id':ids, 'customerCount':0})
print (df)
    id  customerCount
0  321              0
1  654              0

EDIT: Create new id values by change mask by remove ~:

mask = df['id'].isin(df.loc[df['customerCount'] != 0, 'id'])
ids1 = df.loc[~mask, 'id'].unique()
ids2 = df.loc[mask, 'id'].unique()

df1 = pd.DataFrame({'id':ids1, 'customerCount':0})
df2 = pd.DataFrame({'id':ids2, 'customerCount':'>0'})
print (df1)
    id  customerCount
0  321              0
1  654              0

print (df2)
    id customerCount
0  123            >0
1  456            >0

Upvotes: 1

Related Questions