sampeterson
sampeterson

Reputation: 479

Create folder structure based on values from a pandas dataframe

I have a pandas dataframe that contains data of different users with corresponding values on a single date, like this:

import pandas as pd
d = {'user': ['Peter', 'Peter', 'Peter', 'Peter', 'David', 'David', 'David', 'Emma', 'Joyce', 'Joyce', 'Joyce'], 'date': ['2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04'], 'value': ['5', '4', '3', '3', '6', '1', '5', '7', '1', '7', '6']}
df = pd.DataFrame(data=d)
df

user    date        value
Peter   2019-03-04  5
Peter   2019-03-04  4
Peter   2019-03-04  3
Peter   2019-03-04  3
David   2019-03-04  6
David   2019-03-04  1
David   2019-03-04  5
Emma    2019-03-04  7
Joyce   2019-03-04  1
Joyce   2019-03-04  7
Joyce   2019-03-04  6

Using the code below, I can iterate over this dataframe and export all rows to a csv file grouped per user.

for i, x in df.groupby('user'):
    p = os.path.join(os.getcwd(), "{}.csv".format(i))
    x.to_csv(p, index=False)

The file Peter.csv for example looks like this:

user    date        value
Peter   2019-03-04  5
Peter   2019-03-04  4
Peter   2019-03-04  3
Peter   2019-03-04  3

Now I would like to store these files per user in the following folder structure:

Report/
├── Reports per date/ 
│   ├── 2019-03-01/
│   ├── 2019-03-02/
│   ├── 2019-03-03/
│   └── 2019-03-04/
│        └── Users/
│            └── Peter/
│                └── Peter.csv
│            └── David/
│                └── David.csv
│            └── Emma/
│                └── Emma.csv
│            └── Joyce/
│                └── Joyce.csv
│
└── Reports per month/

I know I can generate a folder named after each user like so:

import os
root_path = '/root/path/'
users_dirs = df['user'].unique().tolist()
for folder in users_dirs:
    os.mkdir(os.path.join(root_path,str(folder)))

But I struggle how I can combine these code to store these files per user in the folder structure as described above. Any ideas?

Upvotes: 0

Views: 1868

Answers (1)

Corralien
Corralien

Reputation: 120419

Use pathlib module from standard library and build target file for each records. Finally, create folder if it does not exist and save your user data.

import pathlib

rootdir = pathlib.Path('./Report')

report_per_date = df.apply(lambda x: rootdir / 'report_per_date' / x['date'] / 'Users' / x['user'] / f"{x['user']}.csv", axis='columns')

for csvfile, data in df.groupby(report_per_date):
    csvfile.parent.mkdir(parents=True, exist_ok=True)
    data.to_csv(csvfile, index=False) 
$ tree Report
Report/
└── report_per_date
    └── 2019-03-04
        └── Users
            ├── David
            │   └── David.csv
            ├── Emma
            │   └── Emma.csv
            ├── Joyce
            │   └── Joyce.csv
            └── Peter
                └── Peter.csv

7 directories, 4 files

Upvotes: 2

Related Questions