Reputation: 35
Is there a way of parsing the column names themselves as datetime.? My column names look like this:
Name SizeRank 1996-06 1996-07 1996-08 ...
I know that I can convert values for a column to datetime values, e.g for a column named datetime, I can do something like this:
temp = pd.read_csv('data.csv', parse_dates=['datetime'])
Is there a way of converting the column names themselves? I have 285 columns i.e my data is from 1996-2019.
Upvotes: 1
Views: 2820
Reputation: 2492
It seems pandas
will accept a datetime object as a column name...
import pandas as pd
from datetime import datetime
import re
columns = ["Name", "2019-01-01","2019-01-02"]
data = [["Tom", 1,0], ["Dick",1,1], ["Harry",0,0]]
df = pd.DataFrame(data, columns = columns)
print(df)
newcolumns = {}
for col in df.columns:
if re.search("\d+-\d+-\d+", col):
newcolumns[col] = pd.to_datetime(col)
else:
newcolumns[col] = col
print(newcolumns)
df.rename(columns = newcolumns, inplace = True)
print("--------------------")
print(df)
print("--------------------")
for col in df.columns:
print(type(col), col)
OUTPUT:
Name 2019-01-01 2019-01-02
0 Tom 1 0
1 Dick 1 1
2 Harry 0 0
{'Name': 'Name', '2019-01-01': Timestamp('2019-01-01 00:00:00'), '2019-01-02': Timestamp('2019-01-02 00:00:00')}
--------------------
Name 2019-01-01 00:00:00 2019-01-02 00:00:00
0 Tom 1 0
1 Dick 1 1
2 Harry 0
--------------------
<class 'str'> Name
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-02 00:00:00
For brevity you can use...
newcolumns = {col:(pd.to_datetime(col) if re.search("\d+-\d+-\d+", col) else col) for col in df.columns}
df.rename(columns = newcolumns, inplace = True)
Upvotes: 0
Reputation: 2569
You could do something like this.
df.columns = df.columns[:2] + pd.to_datetime (df.columns[2:])
Upvotes: 1
Reputation: 373
There's no way of doing that immediately while reading the data from a file afaik, but you can fairly simply convert the columns to datetime after you've read them in. You just need to watch out that you don't pass columns that don't actually contain a date to the function.
Could look something like this, assuming all columns after the first two are dates (as in your example):
dates = pd.to_datetime(df.columns[2:])
You can then do whatever you need to do with those datetimes.
Upvotes: 1