Bart
Bart

Reputation: 10250

Convert date/time columns in Pandas dataframe

I have data sets containing the date (Julian day, column 1), hour (HHMM, column 2) and seconds (column 3) in individual columns:

1  253  2300   0    2.9  114.4  18.42  21.17
1  253  2300  10   3.27  111.2  18.48  21.12
1  253  2300  20   3.22  111.3  18.49  21.09
1  253  2300  30   3.84  106.4  18.52     21
1  253  2300  40   3.75  104.4  18.53  20.85

I'm reading the text file using Pandas as:

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True)

Now I want to convert the date to something more convenient like YYYY-MM-DD HH:MM:SS (The year isn't provided in the data set, but is fixed at 2001).

I tried combining the three columns into one using parse_dates:

df = pd.read_table(file_name, header=None, names=columns, delim_whitespace=True, 
                   parse_dates={'datetime' : ['julian_day','hours','seconds']}) 

which converts the three columns into one string:

In [38]: df['datetime'][0]
Out[38]: '253 2300 0'

I next tried to convert them using date_parser; following this post using something like:

date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %s')

The date_parser itself works, but I can't get this to combine with read_table, and I'm pretty much stuck at this point. Is there an easy way to achieve the conversion?

The full minimal (not-so) working example:

import pandas as pd
import datetime
from io import StringIO

data_file = StringIO("""\
 1  253  2300   0    2.9  114.4  18.42  21.17
 1  253  2300  10   3.27  111.2  18.48  21.12
 1  253  2300  20   3.22  111.3  18.49  21.09
 1  253  2300  30   3.84  106.4  18.52     21
 1  253  2300  40   3.75  104.4  18.53  20.85
""")

date_parser = lambda x: datetime.datetime.strptime(x, '%j %H%M %S')

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
                   parse_dates={'datetime' : ['julian_day','hours','seconds']})

Upvotes: 5

Views: 984

Answers (2)

milos.ai
milos.ai

Reputation: 3930

Not sure if I am missing something but this seems to work:

import pandas as pd
import datetime
from io import StringIO

data_file = StringIO("""\
 1  253  2300   0    2.9  114.4  18.42  21.17
 1  253  2300  10   3.27  111.2  18.48  21.12
 1  253  2300  20   3.22  111.3  18.49  21.09
 1  253  2300  30   3.84  106.4  18.52     21
 1  253  2300  40   3.75  104.4  18.53  20.85
""")

date_parser = lambda x: datetime.datetime.strptime(("2001 " + x), '%Y %j %H%M %S')

columns = ['station','julian_day','hours','seconds','U','Ud','T','RH']
df = pd.read_table(data_file, header=None, names=columns, delim_whitespace=True,\
                   date_parser = date_parser,parse_dates={'datetime' : ['julian_day','hours','seconds']})

I just add the date_parser parameter in read_table and hard codded 2001 in parsing function.

Upvotes: 3

SerialDev
SerialDev

Reputation: 2847

Would something along these lines work? :

def merge_date(df, year='Year', month='Month', day='Day', hours='Hours', seconds='Seconds'):
    """
    * Function: merge_date 
    * Usage: merge_date(DataFrame, col_year, col_month, col_day) . . .
    * -------------------------------
    * This function returns Datetime in the format YYYY-MM-DD from
    * input of a dataframe with columns holding 'Year', 'Month', 'Day' 
    """

    df['DateTime'] = df[[year, month, day, hours, seconds]].apply(lambda s : datetime.datetime(*s),axis = 1)
    return df

use datetime.datetime with argument unpacking for each dataframe column

Upvotes: 1

Related Questions