dektorpan
dektorpan

Reputation: 51

Python Pandas combining timestamp columns and fillna in read_csv

I'm reading a csv file with Pandas. The format is:

Date        Time        x1      x2     x3     x4    x5
3/7/2012    11:09:22    13.5    2.3    0.4    7.3   6.4
                        12.6    3.4    9.0    3.0   7.0
                        3.6     4.4    8.0    6.0   5.0
                        10.6    3.5    1.0    3.0   8.0
...
3/7/2012    11:09:23    10.5    23.2   0.3    7.8   4.4
                        11.6    13.4   19.0   13.0  17.0
...

As you can see, not every row has a timestamp. Every row without a timestamp is from the same 1-second interval as the closest row above it that does have a timestamp.

I am trying to do 3 things: 1. combine the Date and Time columns to get a single timestamp column. 2. convert that column to have units of seconds. 3. fill empty cells to have the appropriate timestamp. The desired end result is an array with the timestamp, in seconds, at each row.

I am not sure how to quickly convert the timestamps into units of seconds, other then to do a slow for loop and use the Python builtin time.mktime method.

Then when I fill in missing timestamp values, the problem is that the cells in the Date and Time columns which did not have a timestamp each get a "nan" value and when merged give a cell with the value "nan nan". Then when I use the fillna() method, it doesn't interpret "nan nan" as being a nan.

I am using the following code to get the problem result (not including the part of trying to convert to seconds):

import pandas as pd
df = pd.read_csv('file.csv', delimiter=',', parse_dates={'CorrectTime':[0,1]}, usecols=[0,1,2,4,6], names=['Date','Time','x1','x3','x5'])
df.fillna(method='ffill', axis=0, inplace=True)

Thanks for your help.

Upvotes: 2

Views: 1385

Answers (1)

Paul H
Paul H

Reputation: 68186

Assuming you want seconds since Jan 1, 1900...

import pandas
from io import StringIO
import datetime
data = StringIO("""\
Date,Time,x1,x2,x3,x4,x5
3/7/2012,11:09:22,13.5,2.3,0.4,7.3,6.4
,,12.6,3.4,9.0,3.0,7.0
,,3.6,4.4,8.0,6.0,5.0
,,10.6,3.5,1.0,3.0,8.0
3/7/2012,11:09:23,10.5,23.2,0.3,7.8,4.4
,,11.6,13.4,19.0,13.0,17.0
""")

df = pandas.read_csv(data, parse_dates=['Date']).fillna(method='ffill')

def dealwithdates(row):
    datestring = row['Date'].strftime('%Y-%m-%d')
    dtstring = '{} {}'.format(datestring, row['Time'])
    date = datetime.datetime.strptime(dtstring, '%Y-%m-%d %H:%M:%S')

    refdate = datetime.datetime(1900, 1, 1)
    return (date - refdate).total_seconds()

df['ordinal'] = df.apply(dealwithdates, axis=1)
print(df)

        Date      Time    x1    x2    x3    x4    x5     ordinal
0 2012-03-07  11:09:22  13.5   2.3   0.4   7.3   6.4  3540107362
1 2012-03-07  11:09:22  12.6   3.4   9.0   3.0   7.0  3540107362
2 2012-03-07  11:09:22   3.6   4.4   8.0   6.0   5.0  3540107362
3 2012-03-07  11:09:22  10.6   3.5   1.0   3.0   8.0  3540107362
4 2012-03-07  11:09:23  10.5  23.2   0.3   7.8   4.4  3540107363
5 2012-03-07  11:09:23  11.6  13.4  19.0  13.0  17.0  3540107363

Upvotes: 2

Related Questions