Opriv
Opriv

Reputation: 23

timestamp is not read by Pandas

I have a timestamp that looks like this: "1994-10-01:00:00:00" and when I've trying with pd.read_csv or pd.read_table to read this dataset, it imports everything including the date column ([0]) but not even as an object. This is part of my code:

namevar = ['timestamp', 'nsub',
       'sub_cms',             # var 1 [cms]
       'sub_gwflow',          # var 2 [cfs]
       'sub_interflow',       # var 3 [cfs]
       'sub_sroff',           # var 4 [cfs]
       ....
       'subinc_sroff',        # var 13
       'subinc_tavgc']        # var 14           

df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
             names=namevar, infer_datetime_format=True,
             parse_dates=[0])

print(df)

Results in:

                     nsub  sub_cms  ...  subinc_sroff  subinc_tavgc
timestamp

1994-10-01:00:00:00     1   4.4180  ...           0.0      59.11000
1994-10-01:00:00:00     2   2.6690  ...           0.0      89.29000
1994-10-01:00:00:00     3   4.3170  ...           0.0      77.02000
...
2000-09-30:00:00:00     2   2.3580  ...           0.0       0.19570
2000-09-30:00:00:00     3   2.2250  ...           0.0       0.73340
2000-09-30:00:00:00     4   0.8876  ...           0.0       0.07124
[8768 rows x 15 columns]


print(df.dtypes)

Results in:

nsub                  int64
sub_cms             float64
sub_gwflow          float64
sub_interflow       float64
sub_sroff           float64
subinc_actet        float64
...
subinc_sroff        float64
subinc_tavgc        float64
dtype: object

my ultimate goal is that once the timestamp is in the dataframe I could modify it by getting rid of the time, with:

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)

but when I run this now, it is telling me " KeyError: 'timestamp' "

Any help in getting the timestamp in the dataframe is much appreciated.

Upvotes: 0

Views: 2278

Answers (4)

jsmart
jsmart

Reputation: 3001

I think the issue is that the timestamp is in a non-standard format. There is a colon between the date and time parts. Here is a way to convert the value in the example:

import datetime

# note ':' between date part and time part
raw_timestamp = '1994-10-01:00:00:00'
format_string = '%Y-%m-%d:%H:%M:%S'
result = datetime.datetime.strptime(raw_timestamp, format_string)

print(result)
1994-10-01 00:00:00

You could use pd.to_datetime() with the format_string in this example, to process an entire column of timestamps.

UPDATE

Here is an example that uses a modified version of the original data (timestamp + one column; every entry is unique):

from io import StringIO
import pandas as pd

data = '''timestamp  nsub  
1994-10-01:00:00:00  1   
1994-10-02:00:00:00  2   
1994-10-03:00:00:00  3   
2000-09-28:00:00:00  4   
2000-09-29:00:00:00  5   
2000-09-30:00:00:00  6   
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df['timestamp'] = pd.to_datetime(df['timestamp'], 
                                 format='%Y-%m-%d:%H:%M:%S', 
                                 errors='coerce')
print(df, end='\n\n')
print(df.dtypes)

   timestamp  nsub
0 1994-10-01     1
1 1994-10-02     2
2 1994-10-03     3
3 2000-09-28     4
4 2000-09-29     5
5 2000-09-30     6

timestamp    datetime64[ns]
nsub                  int64
dtype: object

Upvotes: 0

Opriv
Opriv

Reputation: 23

As highlighted by @s13wr81, the way bring 'timstamp' into the dataframe as a column was by removing index_col='timestamp' from the statement.

In order to edit timestamp properly, I needed to remove the :Hr:Min:Sec portion of it by using:

df['timestamp'] = df.timestamp.str.split(":", expand=True)

and then to convert timestamp as a Panda datetime I used:

df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)

Upvotes: 1

s13rw81
s13rw81

Reputation: 162

df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
         names=namevar, infer_datetime_format=True,
         parse_dates=[0])

I think you are explicitly telling pandas to consider column 0 as the index which happens to be your datetime column.

Kindly try removing the index_col=0 from the pd.read_csv() and i think it will work.

Upvotes: 0

Fariliana Eri
Fariliana Eri

Reputation: 301

I am not sure about the data, but i think timestamp is not a column, but index. this kind of problem, sometimes happen when we doing grouping. Try :

"timestamp" in df.columns

if the result false, then :

df = df.reset_index()

next to strip time in timestamp try :

df['timestamp'] = pd.to_datetime(df.timestamp,unit='ns')

Upvotes: 1

Related Questions