Reputation: 910
This is my piece of code to update the rows of a dataframe:
def arrangeData(df):
hour_from_timestamp_list = []
date_from_timestamp_list = []
for row in df.itertuples():
timestamp = row.timestamp
hour_from_timestamp = datetime.fromtimestamp(
int(timestamp) / 1000).strftime('%H:%M:%S')
date_from_timestamp = datetime.fromtimestamp(
int(timestamp) / 1000).strftime('%d-%m-%Y')
hour_from_timestamp_list.append(hour_from_timestamp)
date_from_timestamp_list.append(date_from_timestamp)
df['Time'] = hour_from_timestamp_list
df['Hour'] = pd.to_datetime(df['Time']).dt.hour
df['ChatDate'] = date_from_timestamp_list
return df
Im trying to extract time, hour and chatdate from timestamp. The code is working fine. But when theres huge set of data, somewhere around 300,000 rows, the function is extremely slow. Can anyone suggest a better way to execute this function faster?
For looping I have tried iterrows() which was even more slower.
This is the document that im processing :
{
"_id" : ObjectId("5b9feadc32214d2b504ea6e1"),
"id" : 34176,
"timestamp" : NumberLong(1535019434998),
"platform" : "Email",
"sessionId" : LUUID("08a5caac-baa3-11e8-a508-106530216ef0"),
"intentStatus" : "NotHandled",
"botId" : "tony"
}
Upvotes: 3
Views: 677
Reputation: 862841
I believe here is possible use:
#thanks @Chris A for another solution
t = pd.to_datetime(df['timestamp'], unit='ms')
t = pd.to_datetime(df['timestamp'].astype(int) / 1000)
#alternative
#t = pd.to_datetime(df['timestamp'].apply(int) / 1000)
#t = pd.to_datetime([int(x) / 1000 for x in df['timestamp']] )
df['Time'] = t.dt.strftime('%H:%M:%S')
df['Hour'] = t.dt.hour
df['ChatDate'] = t.dt.strftime('%d-%m-%Y')
Upvotes: 2