tedioustortoise
tedioustortoise

Reputation: 269

How to export pandas dataframe .to_csv, in multiple chunks, based upon criteria in one of the columns (similar to groupby)

I have a dataframe already sorted by Dr:

>>> df
         Date  Dr  Cr  Amount  Cum Sum
0 2022-01-01  B1  D1    1000     1000
3 2022-01-05  B1  E1    -500      500
2 2022-01-01  D1  B1   -1000    -1000
1 2022-01-05  E1  B1     500      500

df.to_csv(index=False) produces: enter image description here

However, I would to to export .to_csv() so that every new entry starts at the next Dr value.

I have attempted incorrect solutions such as df.groupby(‘Dr’).to_csv(index=False) and also other solutions with .to_csv(mode=‘a’) but without any luck.

Ideally, the following .csv output will be achieved, with the space, name of Dr and the dataframe headers repeated:

enter image description here

Thank you.

Upvotes: 0

Views: 987

Answers (3)

Kumar
Kumar

Reputation: 153

You need to use apply with groupby

import pandas as pd

filename = 'your-filename.csv'
groupby_col = 'Dr'

def write_csv(df):
    group = df[groupby_col].iat[0]

    # first write only group name to the file using a temporary dataframe
    header_df = pd.DataFrame(group, index=[0], columns=['group'])
    header_df.to_csv(filename, index=False, mode='a', header=False)

    # now write the data
    df.to_csv(filename, index=False, mode='a')


df = pd.read_csv(r'.\POC\files\drcr.csv')
df.groupby(groupby_col).apply(write_csv)
print("Done")

Upvotes: 1

rpanai
rpanai

Reputation: 13447

It looks to me a really custom solution and I don't think you could use pandas for that.

Data

import pandas as pd

data = {'Date': {0: '2022-01-01', 1: '2022-01-05', 2: '2022-01-01', 3: '2022-01-05'},
 'Dr': {0: 'B1', 1: 'B1', 2: 'D1', 3: 'E1'},
 'Cr': {0: 'D1', 1: 'E1', 2: 'B1', 3: 'B1'},
 'Amount': {0: '1000', 1: '-500', 2: '-1000', 3: '500'},
 'CumSum': {0: '1000', 1: '500', 2: '-1000', 3: '500'}}

df = pd.DataFrame(data)

Group by and write to file

lst = list(df.groupby("Dr"))

cols = df.columns
with open('file.txt', 'w') as f:
    for l in lst:
        f.write(l[0])
        f.write('\n')
        for col in cols[:-1]:
            f.write(f"{col} ")
        f.write(f"{cols[-1]}\n")
        for i, row in l[1].iterrows():
            for col in cols[:-1]:
                f.write(f"{row[col]} ")
            f.write(f"{row[cols[-1]]}\n")
        f.write("\n")

Upvotes: 0

Wolf A
Wolf A

Reputation: 219

Have you considered looping through the unique values in the column and filtering the dataframe?

You could try something like

for dr_value in df.Dr.unique():
    df[df.Dr==dr_value].to_csv(f"filename_{dr_value}.csv", index=False)

Upvotes: 1

Related Questions