Reputation: 97
I have a large CSV with multiple years of electricity load data, and I would like to split it into multiple files on a month and year basis - i.e to return individual CSVs for Jan, Feb, Mar etc for 2013, 2014, 2015 etc.
I have reviewed a lot of the solutions in the forums, and have not had any luck. My current file is structured as follows;
01-Jan-11,1,34606,34677,35648,35685,31058,484,1730
01-Jan-11,2,35092,35142,36089,36142,31460,520,1730
01-Jan-11,3,34725,34761,36256,36234,31109,520,1730
01-Jan-11,4,33649,33698,35628,35675,30174,512,1730
The name of the column header in the date column is "SETTLEMENT DATE". I am familiar with using pandas and df's but cannot seem to wrap my head around how to do this one!
Thanks!
Upvotes: 4
Views: 3708
Reputation: 2424
Here is how you can do it.
First you read your data from your csv_file
and save your cols
like this:
df = pd.read_csv('path/input.csv')
cols = df.columns
then you add two columns to your dataframe
which are Month
and Year
:
df['Year'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[-1])
df['Month'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[1])
All you have to do left is to write your dataframe in separate csv files
either by month or year with each file
carrying the name of the month
or year
as follows:
for i in set(df.Year): # for classified by years files
filename = "path/"+i+".csv"
df.loc[df.Year == i].to_csv(filename,index=False,columns=cols)
for i in set(df.Month): # for classified by months files
filename = "path/"+i+".csv"
df.loc[df.Month == i].to_csv(filename,index=False,columns=cols)
Hope this was helpful.
Upvotes: 5