Reputation: 131
I've researched this question heavily for the past few days and I still cannot find suggestions to my problem.
Below is an example of my dataframe titled 'dfs'. There are around 80 columns, only 4 shown in the below example.
dfs is a large dataframe consisting of rows of data reported every 15 minutes for over 12 months (i.e. 2015-08-01 00:00:00 to 2016-09-30 23:45:00). The Datetime column is in the format datetime.
...
...
I want to export (or write) multiple monthly csv files, which are snippets of monthly data taken from the original large csv file (dfs). For each month, I want a file to be written that contains the the raw data, day data (6am-6pm) and night data (6pm-6am). I also want the name of each monthly file to be automated so it knows whether to call itself dfs_%Y%m, or dfs_day_%Y%m, or dfs_night_%Y%m depending on the data it contains.
At the moment I am writing out over 180 lines of code to export each csv file.
For example:
I create monthly raw, day and night files by grabbing the data between the datetimes listed below from the index Datetime column
dfs201508 = dfs.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
dfs201508Day = dfsDay.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
dfs201508Night = dfsNight.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']
Then I export these files to their respective outputpaths and give them a filename
dfs201508 = dfs201508.to_csv(outputpath+"dfs201508.csv")
dfs201508Day = dfs201508Day.to_csv(outputpathDay+"dfs_day_201508.csv")
dfs201508Night = dfs201508Night.to_csv(outputpathNight+"dfs_night_201508.csv")
What I want to write is something like this
dfs_%Y%m = dfs.ix["%Y%m"]
dfs_day_%Y%m = dfs.ix["%Y%m(between 6am-6pm)"]
dfs_night_%Y%m = dfs.ix["%Y%m(between 6pm-6am)"]
dfs_%Y%m = dfs_%Y%m.to_csv(outputpath +"dfs_%Y%m.csv")
dfs_day_%Y%m = dfs_day_%Y%m.to_csv(outputpath%day +"dfs_day_%Y%m.csv")
dfs_night_%Y%m = dfs_night_%Y%m.to_csv(outputpath%night +"dfs_night_%Y%m.csv")
Any suggestions on the code to automate this process would be greatly appreciated.
Here are some links to pages I researched:
https://www.youtube.com/watch?v=aeZKJGEfD7U
Writing multiple Python dictionaries to csv file
Open a file name +date as csv in Python
Upvotes: 1
Views: 1681
Reputation: 1599
You can use a for
loop to iterate over the years and months contained within dfs
. I created a dummy dataframe called DF
in the below example, which contains just three sample columns:
dates Egen1_kwh Egen2_kwh
2016-01-01 00:00:00 15895880 15877364
2016-01-01 00:15:00 15895880 15877364
2016-01-01 00:30:00 15895880 15877364
2016-01-01 00:45:00 15895880 15877364
2016-01-01 01:00:00 15895880 15877364
The below code filters the main dataframe DF
into smaller dataframes (NIGHT
and DAY
) for each month within each year and saves them to as .csv
with a name corresponding to their date (e.g. 2016_1_DAY
and 2016_1_NIGHT
for Jan 2016 Day and Jan 2016 Night).
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
from random import randint
# I defined a sample dataframe with dummy data
start = datetime.datetime(2016,1,1,0,0)
dates = [start + relativedelta(minutes=15*i) for i in range(0,10000)]
Egen1_kwh = randint(15860938,15898938)
Egen2_kwh = randint(15860938,15898938)
DF = pd.DataFrame({
'dates': dates,
'Egen1_kwh': Egen1_kwh,
'Egen2_kwh': Egen2_kwh,
})
# define when day starts and ends (MUST USE 24 CLOCK)
day = {
'start': datetime.time(6,0), # start at 6am (6:00)
'end': datetime.time(18,0) # ends at 6pm (18:00)
}
# capture years that appear in dataframe
min_year = DF.dates.min().year
max_year = DF.dates.max().year
if min_year == max_year:
yearRange = [min_year]
else:
yearRange = range(min_year, max_year+1)
# iterate over each year and each month within each year
for year in yearRange:
for month in range(1,13):
# filter to show NIGHT and DAY dataframe for given month within given year
NIGHT = DF[(DF.dates >= datetime.datetime(year, month, 1)) &
(DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) &
((DF.dates.apply(lambda x: x.time()) <= day['start']) | (DF.dates.apply(lambda x: x.time()) >= day['end']))]
DAY = DF[(DF.dates >= datetime.datetime(year, month, 1)) &
(DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) &
((DF.dates.apply(lambda x: x.time()) > day['start']) & (DF.dates.apply(lambda x: x.time()) < day['end']))]
# save to .csv with date and time in file name
# specify the save path of your choice
path_night = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_NIGHT.csv'.format(year, month)
path_day = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_DAY.csv'.format(year, month)
# some of the above NIGHT / DAY filtering will return no rows.
# Check for this, and only save if the dataframe contains rows
if NIGHT.shape[0] > 0:
NIGHT.to_csv(path_night, index=False)
if DAY.shape[0] > 0:
DAY.to_csv(path_day, index=False)
Upvotes: 1