Reputation: 13
I have a problem combining the day month year columns to form a date column in a data frame using pd.to_datetime. Below is the dataframe,i'm working on and the columns Yr,Mo,Dy represents as year month day.
data = pd.read_table("/ALabs/wind.data",sep = ',')
Yr Mo Dy RPT VAL ROS KIL
61 1 1 15.04 14.96 13.17 9.29
61 1 2 14.71 NaN 10.83 6.50
61 1 3 18.50 16.88 12.33 10.13
So I've tried the below code, i get the following error: "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing"
Step 1:
data['Date'] = pd.to_datetime(data[['Yr','Mo','Dy']],format="%y-%m-%d")
Next I've tried converting Yr,Mo,Dy column datatype to datetime64 from int64 and assigning the result to new columns Year,Month,Day. Now when i try to combine the columns i'm getting the proper date format in the new date column and i have no idea how i got the desired result.
Step2:
data['Year'] = pd.to_datetime(data.Yr,format='%y').dt.year
data['Month'] = pd.to_datetime(data.Mo,format='%m').dt.month
data['Day'] = pd.to_datetime(data.Dy,format ='%d').dt.day
data['Date'] =pd.to_datetime(data[['Year','Month','Day']])
Result:
Yr Mo Dy Year Month Day Date
61 1 1 2061 1 1 2061-01-01
61 1 2 2061 1 2 2061-01-02
61 1 3 2061 1 3 2061-01-03
61 1 4 2061 1 4 2061-01-04
But if i try doing the same method by changing the column names from year,month, day to Yy,Mh,Di like in the below code. I get the same error "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing"
Step3:
data['Yy'] = pd.to_datetime(data.Yr,format='%y').dt.year
data['Mh'] = pd.to_datetime(data.Mo,format='%m').dt.month
data['Di'] = pd.to_datetime(data.Dy,format ='%d').dt.day
data['Date'] =pd.to_datetime(data[['Yy','Mh','Di']])
What i want to know :
1) Is it mandatory for the argument names to be 'Year' 'Month' 'Day' if we are using pd.to_datetime?
2) Is there any other way to combine the columns in a dataframe to form a date, rather than using this long method?
3) Is this error specific only to python version 3.7??
4)where have i gone wrong in Step 1 and Step 3 ,and why i'm getting o/p when i follow step 2 ?
Upvotes: 1
Views: 1488
Reputation: 13999
As per the pandas.to_datetime
docs, the column names really do have to be 'year'
, 'month'
, and 'day'
(capitalizing the first letter is fine). This explains the answer to all of your questions, and no it has nothing to do with the version of Python (and all recent versions of Pandas behave the same).
Also, you should be aware that when you call to_datetime
with a sequence of columns (as opposed to a single column/list of strings), the format
argument seems to be ignored. So you'll need to normalize your years (to 1961
or 2061
or 1061
, etc) yourself. Here's a complete example of how you could do the conversion in a single line:
import pandas as pd
d = '''Yr Mo Dy RPT VAL ROS KIL
61 1 1 15.04 14.96 13.17 9.29
61 1 2 14.71 NaN 10.83 6.50
61 1 3 18.50 16.88 12.33 10.13 '''
data = pd.read_csv(pd.compat.StringIO(d), sep='\s+')
dtime = pd.to_datetime({k:data[c]+v for c,k,v in zip(('Yr', 'Mo', 'Dy'), ('Year', 'Month', 'Day'), (1900, 0, 0))})
print(dtime)
Output:
0 1961-01-01
1 1961-01-02
2 1961-01-03
dtype: datetime64[ns]
In the above code, instead of adding the appropriately named columns to the dataframe data
, I just made a dict where the key/value pairs are eg. ('Year', data['Yr'])
, and also added 1900
to the years.
You can simplify the dict comprehension a bit by just adding 1900
directly to the appropriate column:
data['Yr'] += 1900
dtime = pd.to_datetime({k:data[c] for c,k in zip(('Yr', 'Mo', 'Dy'), ('year', 'month', 'day'))})
This code will have the same output as the previous.
Upvotes: 1
Reputation: 326
I don't really know how Python deals with years, but the reason it wasn't working had to do with the fact that you were using the year 61.
This works for me
d = {'Day': ["1", "2","3"],
'Month': ["1", "1","1"],
'Year':["61", "61", "61"]}
df = pd.DataFrame(data=d)
df["Year"] = pd.to_numeric(df["Year"])
df.Year = df.Year+2000
df['Date'] = pd.to_datetime(df[['Year','Month','Day']], format='%Y%m%d')
Upvotes: 0