Reputation: 1568
I am trying to create a Date column from multiple columns Year, Month, and Day, however I get a few errors using standard procedure for creating date columns.
import pandas as pd
from urllib.request import urlopen
url = "https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h2017.txt.gz&dir=data/historical/stdmet/"
data_csv = urlopen(url)
df = pd.read_csv(data_csv, delim_whitespace=True, index_col=0, parse_dates=True)
#Reset Index
df.reset_index(level=0, inplace=True)
#remove 1st row contains erronous characters
df = df.iloc[1:]
#Rename Year column
df = df.rename(columns={'#YY': 'YY'})
df['Date'] = pd.to_datetime((df.YY*10000+df.MM*100+df.DD).apply(str),format='%Y%m%d')
The last command does not work properly as no date column is created. Result:
ValueError: unconverted data remains: 4201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201
Upvotes: 2
Views: 3862
Reputation: 59549
pd.to_datetime
can automatically parse dates from multiple columns if they are named properly ('year', 'month', 'day', 'hour', 'minute'
)
pd.to_datetime(df[['YY', 'MM', 'DD']].rename(columns={'YY': 'year', 'MM': 'month', 'DD': 'day'}))
1 2017-01-02
2 2017-01-02
3 2017-01-02
4 2017-01-02
5 2017-01-02
...
2427 2017-03-05
2428 2017-03-05
2429 2017-03-05
2430 2017-03-05
You can also add hours and minutes:
pd.to_datetime(df[['YY', 'MM', 'DD', 'hh', 'mm']].rename(
columns={'YY': 'year', 'MM': 'month', 'DD': 'day',
'hh': 'hour', 'mm': 'minute'}))
#1 2017-01-02 06:00:00
#2 2017-01-02 06:20:00
#...
#2429 2017-03-05 01:40:00
#2430 2017-03-05 02:00:00
Upvotes: 5