Reputation: 269
I have a dataframe already sorted by Dr
:
>>> df
Date Dr Cr Amount Cum Sum
0 2022-01-01 B1 D1 1000 1000
3 2022-01-05 B1 E1 -500 500
2 2022-01-01 D1 B1 -1000 -1000
1 2022-01-05 E1 B1 500 500
df.to_csv(index=False)
produces:
However, I would to to export .to_csv()
so that every new entry starts at the next Dr
value.
I have attempted incorrect solutions such as df.groupby(‘Dr’).to_csv(index=False)
and also other solutions with .to_csv(mode=‘a’)
but without any luck.
Ideally, the following .csv output will be achieved, with the space, name of Dr
and the dataframe headers repeated:
Thank you.
Upvotes: 0
Views: 987
Reputation: 153
You need to use apply
with groupby
import pandas as pd
filename = 'your-filename.csv'
groupby_col = 'Dr'
def write_csv(df):
group = df[groupby_col].iat[0]
# first write only group name to the file using a temporary dataframe
header_df = pd.DataFrame(group, index=[0], columns=['group'])
header_df.to_csv(filename, index=False, mode='a', header=False)
# now write the data
df.to_csv(filename, index=False, mode='a')
df = pd.read_csv(r'.\POC\files\drcr.csv')
df.groupby(groupby_col).apply(write_csv)
print("Done")
Upvotes: 1
Reputation: 13447
It looks to me a really custom solution and I don't think you could use pandas for that.
import pandas as pd
data = {'Date': {0: '2022-01-01', 1: '2022-01-05', 2: '2022-01-01', 3: '2022-01-05'},
'Dr': {0: 'B1', 1: 'B1', 2: 'D1', 3: 'E1'},
'Cr': {0: 'D1', 1: 'E1', 2: 'B1', 3: 'B1'},
'Amount': {0: '1000', 1: '-500', 2: '-1000', 3: '500'},
'CumSum': {0: '1000', 1: '500', 2: '-1000', 3: '500'}}
df = pd.DataFrame(data)
lst = list(df.groupby("Dr"))
cols = df.columns
with open('file.txt', 'w') as f:
for l in lst:
f.write(l[0])
f.write('\n')
for col in cols[:-1]:
f.write(f"{col} ")
f.write(f"{cols[-1]}\n")
for i, row in l[1].iterrows():
for col in cols[:-1]:
f.write(f"{row[col]} ")
f.write(f"{row[cols[-1]]}\n")
f.write("\n")
Upvotes: 0
Reputation: 219
Have you considered looping through the unique values in the column and filtering the dataframe?
You could try something like
for dr_value in df.Dr.unique():
df[df.Dr==dr_value].to_csv(f"filename_{dr_value}.csv", index=False)
Upvotes: 1