Reputation: 303
Currently, I am working in 2.2 Million records. Where two column consist of membership_id
and txn_time
. The data frame looks like -
membership_id txn_time
1 2019-02-17 00:00:00.0
2 2018-04-23 00:00:00.0
3 2018-12-17 00:00:00.0
4 2019-02-17 00:00:00.0
5 2018-04-02 00:00:00.0
6 2018-09-10 06:20:58.0
7 2019-01-16 08:11:42.0
I want the data frame looks like -
membership_id txn_time
1 2019-02-17
2 2018-04-23
3 2018-12-17
4 2019-02-17
5 2018-04-02
6 2018-09-10
7 2019-01-16
What I have done so far -
df_txn['TXN_DATE'] = pd.to_datetime(df_txn['txn_time'], errors='coerce')
But, it's not working and the no of records is huge 2.2 million.
Thanks in advance.
Upvotes: 1
Views: 87
Reputation: 464
This lambda function would help you solve the problem without using datetime library.
df['txn_time'] = df['txn_time'].apply(lambda x:x.split()[0])
Upvotes: 0
Reputation: 863511
For improve performance use parameter format
, then convert to datetimes with no time
s by dt.floor
, better if need process data later by datetimelike function(s):
df_txn['TXN_DATE'] = pd.to_datetime(df_txn['txn_time'],
errors='coerce',
format='%Y-%m-%d %H:%M:%S.%f').dt.floor('d')
Or to python date
s by dt.date
, but get object
:
df_txn['TXN_DATE'] = pd.to_datetime(df_txn['txn_time'],
errors='coerce',
format='%Y-%m-%d %H:%M:%S.%f').dt.date
Upvotes: 1