Reputation: 79
PROBLEM Statement #1 (EASY) I wanted to convert the time column of my dataframe to actual time value like "12:01:45" hh:mm:ss
Have tried :
df_new["time_1"] = pd.to_datetime(df_new.TIME)
This has given me a new row - but its showing the date value not the time. :(
then I tried df_new['Time_1'] = pd.to_datetime(df_new['TIME'], format='%H%M').dt.time
But output is :
ValueError: time data '0' does not match format '%H%M' (match)
expected >> time_1 in hh:mm:ss
sample data :
PRIMARY_KEY DATE TIME LATITUDE LONGITUDE ALTITUDE DATE_1 time_1
0 b04acedfaa0a 43900 0.499964 43.738003 -79.815340 221.0 2020-03-10 1970-01-01
1 efc27d128dac 43900 0.500140 43.736450 -79.813030 221.0 2020-03-10 1970-01-01
2 19b344a8ef88 43900 0.500293 43.730568 -79.821304 216.0 2020-03-10 1970-01-01
3 0c67c3eefb73 43900 0.500431 43.742660 -79.831710 235.0 2020-03-10 1970-01-01
PROBLEM Statement #2 (COMPLEX & Challenging) After converting to actual time, I want to perform some action.
Input will be a primary key value (say 19b344a8ef88)
Operations, I want to make the pairs of primary keys which satisfies two condition :
a) near 50 mtrs distance (by using LAT & LONG) and
b) At nearly same time, say 10 mins of time difference
Output that I am expecting is in the form of :
Primary Key Instances
b04acedfaa0a. 5
efc27d128dac. 3
means total 5 times b04acedfaa0a primary key was 50 mtrs near and at the same time with 19b344a8ef88.
Hope I am able to explain myself...........
Upvotes: 1
Views: 529
Reputation: 635
For your first question, quick fix is to format your data in csv before you import. Before you load your csv into python, you could try changing 'DATE' column to 'Short date' and 'TIME' column to 'Time' in excel and save the csv (rename it so that you don't overwrite your original file) and then try importing in python?
Probably there's a clever way to do this in python.
For your second question, you may be able to get what you are after if you follow these steps:
pd.merge()
df.query()
(i.e. near 50 mtrs distance & max 10 mins of time difference)df['Primary key'].value_counts()
Upvotes: 1
Reputation: 111
When you give to_datetime()
a plain old integer or float value, it will see that value as the number of seconds since "the UNIX epoch", or 1 January 1970. That's why the values in time_1 are all 1970-01-01
. You can read more about it here if you're interested.
To solve your problem, it looks like there are two thing you'll need to address.
First, be sure that the value you're storing in TIME is the right time from the Unix epoch or find another way to store the time.
Once that's worked out, you're storing a Timestamp in time_1, so you can get the time part of it by using something like df_new["time_1"].dt.time
, which should give you something in the form "hh:mm:ss". You can store those values in another column.
Upvotes: 1