Arogbonlo Samuel
Arogbonlo Samuel

Reputation: 39

How to extract dates in ascending order from a csv file in Python

I created a script to extract dates and prices from a CSV file and print it in another CSV file, but the issue is the dates in the raw CSV file are arranged randomly. How do I extract the date to return in ascending order in my csv file instead of randomly? I'd like to do this in Python, you download the entire CSV file here: http://www.eia.gov/dnav/ng/hist/rngwhhdm.htm.

Sample CSV data below. The dates continue till 1997 and they are sorted in reverse already.

Henry Hub Natural Gas Spot Price
https://www.eia.gov/dnav/ng/hist/rngwhhdm.htm
14:35:57 GMT-0800 (Pacific Standard Time)
Data Source: Thomson Reuters
Month,Henry Hub Natural Gas Spot Price Dollars per Million Btu
Jan 2021,2.71
Dec 2020,2.59
Nov 2020,2.61
Oct 2020,2.39
Sep 2020,1.92
Aug 2020,2.3
Jul 2020,1.77
Jun 2020,1.63
May 2020,1.75
Apr 2020,1.74
Mar 2020,1.79
Feb 2020,1.91
Jan 2020,2.02
Dec 2019,2.22
Nov 2019,2.65
Oct 2019,2.33
Sep 2019,2.56
Aug 2019,2.22
Jul 2019,2.37
Jun 2019,2.4
May 2019,2.64
Apr 2019,2.65
Mar 2019,2.95
Feb 2019,2.69
Jan 2019,3.11
Dec 2018,4.04
Nov 2018,4.09
Oct 2018,3.28
Sep 2018,3

The code already written is shown below:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.to_csv('gas-details_year.csv', index=True)

print("Extracted Successfully")

Upvotes: 0

Views: 356

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177526

You can indicate to read_csv to parse the dates and then sort by index:

import pandas as pd 
df = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv',
                 index_col='Month',parse_dates=True,infer_datetime_format=True,skiprows=4)
df.sort_index(inplace=True)
print(df)
            Henry Hub Natural Gas Spot Price Dollars per Million Btu
Month                                                               
1997-01-01                                               3.45       
1997-02-01                                               2.15       
1997-03-01                                               1.89       
1997-04-01                                               2.03       
1997-05-01                                               2.25       
...                                                       ...       
2020-09-01                                               1.92       
2020-10-01                                               2.39       
2020-11-01                                               2.61       
2020-12-01                                               2.59       
2021-01-01                                               2.71       

[289 rows x 1 columns]

Upvotes: 2

Syed
Syed

Reputation: 146

You can use the dataframe.sort_index() method as shown below:

import pandas as pd 

#Basic day granularity of .csv file (day)

dfg = pd.read_csv('Henry_Hub_Natural_Gas_Spot_Price.csv', skiprows=4)
dfg.index = pd.to_datetime(dfg["Day"],format='%m/%d/%Y')
dfg.sort_index()
dfg.to_csv('gas-details_day.csv', index=False) 

#Other granularities and sections of the .csv file (month)

dfg_month = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('M').sum()
df = pd.DataFrame(dfg_month, index=dfg_month.index.strftime("%m"))
df.sort_index()
df.to_csv('gas-details_month.csv', index=True)

#Other granularities and sections of the .csv file (year)
dfg_year = dfg['Henry Hub Natural Gas Spot Price Dollars per Million Btu'].resample('Y').sum()
df = pd.DataFrame(dfg_year, index=dfg_year.index.strftime("%Y"))
df.sort_index()
df.to_csv('gas-details_year.csv', index=True)

print("Exctracted Succesfully")

Upvotes: 0

Related Questions