Sas_learner
Sas_learner

Reputation: 13

using pd.to_datetime to form a date by taking input of year,months,day present in different columns in a data frame

I have a problem combining the day month year columns to form a date column in a data frame using pd.to_datetime. Below is the dataframe,i'm working on and the columns Yr,Mo,Dy represents as year month day.

data = pd.read_table("/ALabs/wind.data",sep = ',')
Yr  Mo  Dy  RPT     VAL     ROS    KIL      
61  1   1   15.04   14.96   13.17  9.29
61  1   2   14.71   NaN     10.83  6.50 
61  1   3   18.50   16.88   12.33   10.13   

So I've tried the below code, i get the following error: "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing"

Step 1:

data['Date'] = pd.to_datetime(data[['Yr','Mo','Dy']],format="%y-%m-%d")

Next I've tried converting Yr,Mo,Dy column datatype to datetime64 from int64 and assigning the result to new columns Year,Month,Day. Now when i try to combine the columns i'm getting the proper date format in the new date column and i have no idea how i got the desired result.

Step2:

data['Year'] = pd.to_datetime(data.Yr,format='%y').dt.year
data['Month'] = pd.to_datetime(data.Mo,format='%m').dt.month
data['Day'] = pd.to_datetime(data.Dy,format ='%d').dt.day
data['Date'] =pd.to_datetime(data[['Year','Month','Day']])

Result:

Yr  Mo  Dy  Year Month Day  Date
61  1   1   2061    1   1   2061-01-01
61  1   2   2061    1   2   2061-01-02
61  1   3   2061    1   3   2061-01-03
61  1   4   2061    1   4   2061-01-04

But if i try doing the same method by changing the column names from year,month, day to Yy,Mh,Di like in the below code. I get the same error "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing"

Step3:

data['Yy'] = pd.to_datetime(data.Yr,format='%y').dt.year
data['Mh'] = pd.to_datetime(data.Mo,format='%m').dt.month
data['Di'] = pd.to_datetime(data.Dy,format ='%d').dt.day
data['Date'] =pd.to_datetime(data[['Yy','Mh','Di']]) 

What i want to know :

1) Is it mandatory for the argument names to be 'Year' 'Month' 'Day' if we are using pd.to_datetime?

2) Is there any other way to combine the columns in a dataframe to form a date, rather than using this long method?

3) Is this error specific only to python version 3.7??

4)where have i gone wrong in Step 1 and Step 3 ,and why i'm getting o/p when i follow step 2 ?

Upvotes: 1

Views: 1488

Answers (2)

tel
tel

Reputation: 13999

As per the pandas.to_datetime docs, the column names really do have to be 'year', 'month', and 'day' (capitalizing the first letter is fine). This explains the answer to all of your questions, and no it has nothing to do with the version of Python (and all recent versions of Pandas behave the same).

Also, you should be aware that when you call to_datetime with a sequence of columns (as opposed to a single column/list of strings), the format argument seems to be ignored. So you'll need to normalize your years (to 1961 or 2061 or 1061, etc) yourself. Here's a complete example of how you could do the conversion in a single line:

import pandas as pd

d = '''Yr  Mo  Dy  RPT     VAL     ROS    KIL      
61  1   1   15.04   14.96   13.17  9.29
61  1   2   14.71   NaN     10.83  6.50 
61  1   3   18.50   16.88   12.33   10.13  '''
data = pd.read_csv(pd.compat.StringIO(d), sep='\s+')

dtime = pd.to_datetime({k:data[c]+v for c,k,v in zip(('Yr', 'Mo', 'Dy'), ('Year', 'Month', 'Day'), (1900, 0, 0))})
print(dtime)

Output:

0   1961-01-01
1   1961-01-02
2   1961-01-03
dtype: datetime64[ns]

In the above code, instead of adding the appropriately named columns to the dataframe data, I just made a dict where the key/value pairs are eg. ('Year', data['Yr']), and also added 1900 to the years.

You can simplify the dict comprehension a bit by just adding 1900 directly to the appropriate column:

data['Yr'] += 1900
dtime = pd.to_datetime({k:data[c] for c,k in zip(('Yr', 'Mo', 'Dy'), ('year', 'month', 'day'))})

This code will have the same output as the previous.

Upvotes: 1

Martijn van Amsterdam
Martijn van Amsterdam

Reputation: 326

I don't really know how Python deals with years, but the reason it wasn't working had to do with the fact that you were using the year 61.

This works for me

d = {'Day': ["1", "2","3"],
     'Month': ["1", "1","1"],
     'Year':["61", "61", "61"]}
df = pd.DataFrame(data=d)
df["Year"] = pd.to_numeric(df["Year"])
df.Year = df.Year+2000
df['Date'] = pd.to_datetime(df[['Year','Month','Day']], format='%Y%m%d')

Upvotes: 0

Related Questions