Tristan Sun
Tristan Sun

Reputation: 299

How to define/format date and time in python

My cvs file has Date and time columns. after I import csv file to python, the data frame looks like:

name      date       time      price    ....
vod     01-Nov-13   00:22.9     99
vod     01-Nov-13   00:23.0     98
vod     02-Nov-13   00:22.9     96
vod     02-Nov-13   00:24.1     92
...      ....        ....       ...
vod     31-Dec-13   00:22.9     99

first of all, there something wrong with time column. This is trading data, so the time should be from 7:00 to 16:30. It seems python misread the time format.

However,after I sorted the data by date. The data was like

name      date         ....
vod     01-Nov-13       
vod     01-Dec-13       
vod     02-Nov-13     
vod     02-Dec-13    
vod     03-Nov-13 
vod     03-Dec-13 

again, some wrong with date format. The time shouldn't be in this order...

I just have one question how to define/format date and time in python(pandas), so that I can sort data in order.

Upvotes: 1

Views: 1052

Answers (3)

Jkm
Jkm

Reputation: 187

So you have two question.

First, time column is wrong. By the comment you said that the csv file is exactly what you post. Which means the data you get are something like 00:22.9, but you think it's wrong. Since we don't have a corresponding data that could map to the actually time, so you might need to check that if the time 00:22.9 is elapsed time or whatever(00:22.9 might represent 07:22.9?).

Second, the date string sort problem. Here I transform all the data into a dict list as below: (the data is built by myself)

data =
[{'date': '01-Nov-13', 'name': 'vod', 'price': '98', 'time': '00:23.0'},
{'date': '01-Jan-13', 'name': 'vod', 'price': '91', 'time': '00:23.0'},
{'date': '02-Nov-13', 'name': 'vod', 'price': '96', 'time': '00:22.0'},
{'date': '01-Dec-13', 'name': 'vod', 'price': '101', 'time': '00:23.0'}]

then with the sorted function and datetime module (remember the sorted function change the data it self)

from datetime import datetime
sorted(data, key=lambda each_dict: datetime.strptime(each_dict['date'], '%d-%b-%y'))

the %d-%b-%y is the mask corresponding to 01-Nov-13 (day-month-year), check here for other masks.

  • Edit: if you just want to parse the date data, just simply call:

    datetime.strptime(<date_input>, '%d-%b-%y')
    

    this will return a datetime object like:

    datetime.datetime(2013, 11, 1, 0, 0)
    

The whole sorted data is:

[{'date': '01-Jan-13', 'name': 'vod', 'price': '91', 'time': '00:23.0'},
{'date': '01-Nov-13', 'name': 'vod', 'price': '98', 'time': '00:23.0'},
{'date': '02-Nov-13', 'name': 'vod', 'price': '96', 'time': '00:22.0'},
{'date': '01-Dec-13', 'name': 'vod', 'price': '101', 'time': '00:23.0'}]

The only problem is, it might take some time/memory if you got lots of data set to sort

Upvotes: 1

chris-sc
chris-sc

Reputation: 1718

Ok, while we don't know how your time column in the csv looks like, I'll assume it is in hh:mm:ss format:

name, date, time, price
vod, 01-Nov-13, 07:40:00, 99
vod, 01-Nov-13, 07:34:50, 95

Have a look at the following minimal example. It uses the proposed csv input, combines the date and time column into one datetime object. With the column as dtype datetime64[ns], pandas can sort the data easily.

#!/usr/bin/python                                                                         
import pandas as pd                                                              

# just write a short example.csv to test this.                                                                       
with open('example.csv', 'w') as f:                                              
    f.write('name, date, time, price\n')                                         
    f.write('vod, 01-Nov-13, 07:40:00, 99\nvod, 01-Nov-13, 07:34:50, 95')                     

# read the csv and use the parse_dates option of pd.read_csv to
# format and combine the date and time columns                                                                        
pdf = pd.read_csv('example.csv', parse_dates=[[1, 2]])                            

# for conveniency: rename the column to datetime                         
pdf.rename(columns={' date_ time': 'datetime'}, inplace=True)

# Sort the dataframe according to the datetime column                 
pdf.sort(columns='datetime', inplace=True)

For more details on the options of parse_dates have a look at the doc: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

If the time format in your csv is something parse_dates can't handle, you could use the converters option in pd.read_csv - but for this we need to know how your raw data actually looks like.

Upvotes: 1

mmachine
mmachine

Reputation: 926

You may use

from Python datetime module datetime object
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

Upvotes: 1

Related Questions