tm553
tm553

Reputation: 97

How to split csv file into respective csv files based on date in first column (python)?

I have a large CSV with multiple years of electricity load data, and I would like to split it into multiple files on a month and year basis - i.e to return individual CSVs for Jan, Feb, Mar etc for 2013, 2014, 2015 etc.

I have reviewed a lot of the solutions in the forums, and have not had any luck. My current file is structured as follows;


01-Jan-11,1,34606,34677,35648,35685,31058,484,1730

01-Jan-11,2,35092,35142,36089,36142,31460,520,1730

01-Jan-11,3,34725,34761,36256,36234,31109,520,1730

01-Jan-11,4,33649,33698,35628,35675,30174,512,1730


The name of the column header in the date column is "SETTLEMENT DATE". I am familiar with using pandas and df's but cannot seem to wrap my head around how to do this one!

Thanks!

Upvotes: 4

Views: 3708

Answers (1)

Rayhane Mama
Rayhane Mama

Reputation: 2424

Here is how you can do it.

First you read your data from your csv_file and save your cols like this:

df = pd.read_csv('path/input.csv')
cols = df.columns

then you add two columns to your dataframe which are Month and Year:

df['Year'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[-1])
df['Month'] = df['SETTLEMENT DATE'].apply(lambda x: x.split('-')[1])

All you have to do left is to write your dataframe in separate csv files either by month or year with each file carrying the name of the month or year as follows:

for i in set(df.Year): # for classified by years files
    filename = "path/"+i+".csv"
    df.loc[df.Year == i].to_csv(filename,index=False,columns=cols)

for i in set(df.Month): # for classified by months files
    filename = "path/"+i+".csv"
    df.loc[df.Month == i].to_csv(filename,index=False,columns=cols)

Hope this was helpful.

Upvotes: 5

Related Questions