Reputation: 479
I have a pandas dataframe that contains data of different users with corresponding values on a single date, like this:
import pandas as pd
d = {'user': ['Peter', 'Peter', 'Peter', 'Peter', 'David', 'David', 'David', 'Emma', 'Joyce', 'Joyce', 'Joyce'], 'date': ['2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04'], 'value': ['5', '4', '3', '3', '6', '1', '5', '7', '1', '7', '6']}
df = pd.DataFrame(data=d)
df
user date value
Peter 2019-03-04 5
Peter 2019-03-04 4
Peter 2019-03-04 3
Peter 2019-03-04 3
David 2019-03-04 6
David 2019-03-04 1
David 2019-03-04 5
Emma 2019-03-04 7
Joyce 2019-03-04 1
Joyce 2019-03-04 7
Joyce 2019-03-04 6
Using the code below, I can iterate over this dataframe and export all rows to a csv file grouped per user.
for i, x in df.groupby('user'):
p = os.path.join(os.getcwd(), "{}.csv".format(i))
x.to_csv(p, index=False)
The file Peter.csv
for example looks like this:
user date value
Peter 2019-03-04 5
Peter 2019-03-04 4
Peter 2019-03-04 3
Peter 2019-03-04 3
Now I would like to store these files per user in the following folder structure:
Report/
├── Reports per date/
│ ├── 2019-03-01/
│ ├── 2019-03-02/
│ ├── 2019-03-03/
│ └── 2019-03-04/
│ └── Users/
│ └── Peter/
│ └── Peter.csv
│ └── David/
│ └── David.csv
│ └── Emma/
│ └── Emma.csv
│ └── Joyce/
│ └── Joyce.csv
│
└── Reports per month/
I know I can generate a folder named after each user like so:
import os
root_path = '/root/path/'
users_dirs = df['user'].unique().tolist()
for folder in users_dirs:
os.mkdir(os.path.join(root_path,str(folder)))
But I struggle how I can combine these code to store these files per user in the folder structure as described above. Any ideas?
Upvotes: 0
Views: 1868
Reputation: 120419
Use pathlib
module from standard library and build target file for each records. Finally, create folder if it does not exist and save your user data.
import pathlib
rootdir = pathlib.Path('./Report')
report_per_date = df.apply(lambda x: rootdir / 'report_per_date' / x['date'] / 'Users' / x['user'] / f"{x['user']}.csv", axis='columns')
for csvfile, data in df.groupby(report_per_date):
csvfile.parent.mkdir(parents=True, exist_ok=True)
data.to_csv(csvfile, index=False)
$ tree Report
Report/
└── report_per_date
└── 2019-03-04
└── Users
├── David
│ └── David.csv
├── Emma
│ └── Emma.csv
├── Joyce
│ └── Joyce.csv
└── Peter
└── Peter.csv
7 directories, 4 files
Upvotes: 2