Reputation: 715
I have a CSV file that contains hundreds of rows with repeatable IDs. Would there be a way to easily select rows for each ID but only if column customerCount has all the values for corresponding ID = 0?
My CSV file structure:
report_date id customerCount orderNr
2020-02-20 123 12 10
2020-02-19 123 18 11
2020-02-18 123 0 12
2020-02-20 321 0 0
2020-02-19 321 0 0
2020-02-18 321 0 0
2020-02-20 456 17 13
2020-02-19 456 0 0
2020-02-18 456 15 14
2020-02-20 654 0 0
2020-02-19 654 0 0
2020-02-18 654 0 0
and so on...
Desired output CSV:
id customerCount
321 0
654 0
My code so far (throws TypeError: 'method' object is not subscriptable
):
import pandas as pd
df = pd.read_csv('path/to/my/file.csv')
df1 = df.loc[(df.groupby['id'](['customerCount'] == 0)]
df1.to_csv('/path/to/my.output.csv')
Thanks in advance!
Upvotes: 1
Views: 2087
Reputation: 863146
First idea is use DataFrame.all
with mask, then filter mask by itself and convert to DataFrame
:
s = (df['customerCount'] == 0).groupby(df['id']).all()
df = s[s].reset_index()
df['customerCount'] = 0
print (df)
id customerCount
0 321 0
1 654 0
Or use Series.isin
with inverted mask by ~
all id
which no 0
and create DataFrame
by constructor:
ids = df.loc[~df['id'].isin(df.loc[df['customerCount'] != 0, 'id']), 'id'].unique()
df = pd.DataFrame({'id':ids, 'customerCount':0})
print (df)
id customerCount
0 321 0
1 654 0
EDIT: Create new id
values by change mask by remove ~
:
mask = df['id'].isin(df.loc[df['customerCount'] != 0, 'id'])
ids1 = df.loc[~mask, 'id'].unique()
ids2 = df.loc[mask, 'id'].unique()
df1 = pd.DataFrame({'id':ids1, 'customerCount':0})
df2 = pd.DataFrame({'id':ids2, 'customerCount':'>0'})
print (df1)
id customerCount
0 321 0
1 654 0
print (df2)
id customerCount
0 123 >0
1 456 >0
Upvotes: 1