Sofia693
Sofia693

Reputation: 65

How can I convert a date variable into "int" in Python?

I am working on a regression problem, the data is presented as a csv file of three columns where the second columns contains the dates, I want to convert the date ( format: 1/1/2015 12:00:00 ) into an int (112015120000) in order to be able to normalize and apply my model. I proceeded this way:

data_set = pd.read_csv('train.csv')
date = data_set['Date'] # Dates represent the header of the dates' column
dates = date.values
date1 = [date.replace("-","") for date in dates ]
date2 = [date.replace(":","") for date in date1 ]
date_train = [date.replace(" ","") for date in date2 ]

but I feel it's a lot time consuming and inefficient, is there any shorter way to do it ? otherwise, is it possible to apply the normalization straightforwardly on a datetime type ?

Upvotes: 5

Views: 17106

Answers (3)

jose_bacoy
jose_bacoy

Reputation: 12684

Using regular expression (re). Replace all non digits 0 to 9 with blank.

import re
d = '1/1/20015 12:00:00'
new = re.sub('[^0-9]', '', str(d))
print(int(new))

Result: 20150101120000

Upvotes: 0

Vipin Mohan
Vipin Mohan

Reputation: 1631

I suggest converting to unix timestamp instead of int, its cleaner and universally accepted

import time 
timestamp = time.mktime(time.strptime('1/1/2015 12:00:00', '%d/%m/%Y %H:%M:%S'))

The result is a timestamp which can easily be converted to int. All major languages support conversion to and fro with timestamp.

Upvotes: 0

YOLO
YOLO

Reputation: 21719

You can do :

df['date_new'] = df['date'].str.replace('\D', '').astype(int)

Explanation:

1.'\D' replaces all non-digit characters with ''.
2. Finally, we convert the resultant string to integer with astype.

Here's a dummy example:

df = pd.DataFrame({'date' : pd.date_range('10/1/2018', periods=10, freq='H')})
df['date'] = df['date'].astype(str)
df['new_date'] = df['date'].str.replace('\D', '').astype(int)

    date                    new_date
0   2018-10-01 00:00:00     20181001000000
1   2018-10-01 01:00:00     20181001010000
2   2018-10-01 02:00:00     20181001020000
3   2018-10-01 03:00:00     20181001030000
4   2018-10-01 04:00:00     20181001040000
5   2018-10-01 05:00:00     20181001050000
6   2018-10-01 06:00:00     20181001060000
7   2018-10-01 07:00:00     20181001070000
8   2018-10-01 08:00:00     20181001080000
9   2018-10-01 09:00:00     20181001090000

Upvotes: 4

Related Questions