Monte Cristo
Monte Cristo

Reputation: 131

Python - Writing Multiple Monthly CSV Files from One Large CSV File - Automating Filenames to Reflect Month

I've researched this question heavily for the past few days and I still cannot find suggestions to my problem.

Below is an example of my dataframe titled 'dfs'. There are around 80 columns, only 4 shown in the below example.

dfs is a large dataframe consisting of rows of data reported every 15 minutes for over 12 months (i.e. 2015-08-01 00:00:00 to 2016-09-30 23:45:00). The Datetime column is in the format datetime.

enter image description here

...

enter image description here enter image description here

...

enter image description here

I want to export (or write) multiple monthly csv files, which are snippets of monthly data taken from the original large csv file (dfs). For each month, I want a file to be written that contains the the raw data, day data (6am-6pm) and night data (6pm-6am). I also want the name of each monthly file to be automated so it knows whether to call itself dfs_%Y%m, or dfs_day_%Y%m, or dfs_night_%Y%m depending on the data it contains.

At the moment I am writing out over 180 lines of code to export each csv file.

For example:

I create monthly raw, day and night files by grabbing the data between the datetimes listed below from the index Datetime column

dfs201508 = dfs.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Day = dfsDay.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Night = dfsNight.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

Then I export these files to their respective outputpaths and give them a filename

dfs201508 = dfs201508.to_csv(outputpath+"dfs201508.csv")

dfs201508Day = dfs201508Day.to_csv(outputpathDay+"dfs_day_201508.csv")

dfs201508Night = dfs201508Night.to_csv(outputpathNight+"dfs_night_201508.csv")

What I want to write is something like this

dfs_%Y%m = dfs.ix["%Y%m"]

dfs_day_%Y%m = dfs.ix["%Y%m(between 6am-6pm)"]

dfs_night_%Y%m = dfs.ix["%Y%m(between 6pm-6am)"]

dfs_%Y%m = dfs_%Y%m.to_csv(outputpath +"dfs_%Y%m.csv")

dfs_day_%Y%m = dfs_day_%Y%m.to_csv(outputpath%day +"dfs_day_%Y%m.csv")

dfs_night_%Y%m = dfs_night_%Y%m.to_csv(outputpath%night +"dfs_night_%Y%m.csv")

Any suggestions on the code to automate this process would be greatly appreciated.

Here are some links to pages I researched:

https://www.youtube.com/watch?v=aeZKJGEfD7U

Writing multiple Python dictionaries to csv file

Open a file name +date as csv in Python

Upvotes: 1

Views: 1681

Answers (1)

NickBraunagel
NickBraunagel

Reputation: 1599

You can use a for loop to iterate over the years and months contained within dfs. I created a dummy dataframe called DF in the below example, which contains just three sample columns:

dates               Egen1_kwh   Egen2_kwh
2016-01-01 00:00:00 15895880    15877364
2016-01-01 00:15:00 15895880    15877364
2016-01-01 00:30:00 15895880    15877364
2016-01-01 00:45:00 15895880    15877364
2016-01-01 01:00:00 15895880    15877364

The below code filters the main dataframe DF into smaller dataframes (NIGHT and DAY) for each month within each year and saves them to as .csv with a name corresponding to their date (e.g. 2016_1_DAY and 2016_1_NIGHT for Jan 2016 Day and Jan 2016 Night).

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
from random import randint

# I defined a sample dataframe with dummy data
start = datetime.datetime(2016,1,1,0,0)
dates = [start + relativedelta(minutes=15*i) for i in range(0,10000)]
Egen1_kwh = randint(15860938,15898938)
Egen2_kwh = randint(15860938,15898938)

DF = pd.DataFrame({
        'dates': dates,
        'Egen1_kwh': Egen1_kwh,
        'Egen2_kwh': Egen2_kwh,
    })


# define when day starts and ends (MUST USE 24 CLOCK)
day = {
        'start': datetime.time(6,0),  # start at 6am (6:00)
        'end': datetime.time(18,0)  # ends at 6pm (18:00)
      }


# capture years that appear in dataframe
min_year = DF.dates.min().year
max_year = DF.dates.max().year

if min_year == max_year:
    yearRange = [min_year]
else:
    yearRange = range(min_year, max_year+1)

# iterate over each year and each month within each year
for year in yearRange:
    for month in range(1,13):

        # filter to show NIGHT and DAY dataframe for given month within given year
        NIGHT = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) <= day['start']) | (DF.dates.apply(lambda x: x.time()) >= day['end']))]

        DAY = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) > day['start']) & (DF.dates.apply(lambda x: x.time()) < day['end']))]

        # save to .csv with date and time in file name
        # specify the save path of your choice
        path_night = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_NIGHT.csv'.format(year, month)
        path_day = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_DAY.csv'.format(year, month)

        # some of the above NIGHT / DAY filtering will return no rows.
        # Check for this, and only save if the dataframe contains rows
        if NIGHT.shape[0] > 0:
            NIGHT.to_csv(path_night, index=False)
        if DAY.shape[0] > 0:
            DAY.to_csv(path_day, index=False)

Upvotes: 1

Related Questions