Reputation: 23
I have a timestamp that looks like this: "1994-10-01:00:00:00" and when I've trying with pd.read_csv or pd.read_table to read this dataset, it imports everything including the date column ([0]) but not even as an object. This is part of my code:
namevar = ['timestamp', 'nsub',
'sub_cms', # var 1 [cms]
'sub_gwflow', # var 2 [cfs]
'sub_interflow', # var 3 [cfs]
'sub_sroff', # var 4 [cfs]
....
'subinc_sroff', # var 13
'subinc_tavgc'] # var 14
df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
names=namevar, infer_datetime_format=True,
parse_dates=[0])
print(df)
Results in:
nsub sub_cms ... subinc_sroff subinc_tavgc
timestamp
1994-10-01:00:00:00 1 4.4180 ... 0.0 59.11000
1994-10-01:00:00:00 2 2.6690 ... 0.0 89.29000
1994-10-01:00:00:00 3 4.3170 ... 0.0 77.02000
...
2000-09-30:00:00:00 2 2.3580 ... 0.0 0.19570
2000-09-30:00:00:00 3 2.2250 ... 0.0 0.73340
2000-09-30:00:00:00 4 0.8876 ... 0.0 0.07124
[8768 rows x 15 columns]
print(df.dtypes)
Results in:
nsub int64
sub_cms float64
sub_gwflow float64
sub_interflow float64
sub_sroff float64
subinc_actet float64
...
subinc_sroff float64
subinc_tavgc float64
dtype: object
my ultimate goal is that once the timestamp is in the dataframe I could modify it by getting rid of the time, with:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)
but when I run this now, it is telling me " KeyError: 'timestamp' "
Any help in getting the timestamp in the dataframe is much appreciated.
Upvotes: 0
Views: 2278
Reputation: 3001
I think the issue is that the timestamp is in a non-standard format. There is a colon between the date and time parts. Here is a way to convert the value in the example:
import datetime
# note ':' between date part and time part
raw_timestamp = '1994-10-01:00:00:00'
format_string = '%Y-%m-%d:%H:%M:%S'
result = datetime.datetime.strptime(raw_timestamp, format_string)
print(result)
1994-10-01 00:00:00
You could use pd.to_datetime() with the format_string in this example, to process an entire column of timestamps.
UPDATE
Here is an example that uses a modified version of the original data (timestamp + one column; every entry is unique):
from io import StringIO
import pandas as pd
data = '''timestamp nsub
1994-10-01:00:00:00 1
1994-10-02:00:00:00 2
1994-10-03:00:00:00 3
2000-09-28:00:00:00 4
2000-09-29:00:00:00 5
2000-09-30:00:00:00 6
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%Y-%m-%d:%H:%M:%S',
errors='coerce')
print(df, end='\n\n')
print(df.dtypes)
timestamp nsub
0 1994-10-01 1
1 1994-10-02 2
2 1994-10-03 3
3 2000-09-28 4
4 2000-09-29 5
5 2000-09-30 6
timestamp datetime64[ns]
nsub int64
dtype: object
Upvotes: 0
Reputation: 23
As highlighted by @s13wr81, the way bring 'timstamp' into the dataframe as a column was by removing index_col='timestamp' from the statement.
In order to edit timestamp properly, I needed to remove the :Hr:Min:Sec portion of it by using:
df['timestamp'] = df.timestamp.str.split(":", expand=True)
and then to convert timestamp as a Panda datetime I used:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)
Upvotes: 1
Reputation: 162
df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
names=namevar, infer_datetime_format=True,
parse_dates=[0])
I think you are explicitly telling pandas to consider column 0 as the index which happens to be your datetime column.
Kindly try removing the index_col=0
from the pd.read_csv()
and i think it will work.
Upvotes: 0
Reputation: 301
I am not sure about the data, but i think timestamp is not a column, but index. this kind of problem, sometimes happen when we doing grouping. Try :
"timestamp" in df.columns
if the result false, then :
df = df.reset_index()
next to strip time in timestamp try :
df['timestamp'] = pd.to_datetime(df.timestamp,unit='ns')
Upvotes: 1