Reputation: 299
My cvs file has Date and time columns. after I import csv file to python, the data frame looks like:
name date time price ....
vod 01-Nov-13 00:22.9 99
vod 01-Nov-13 00:23.0 98
vod 02-Nov-13 00:22.9 96
vod 02-Nov-13 00:24.1 92
... .... .... ...
vod 31-Dec-13 00:22.9 99
first of all, there something wrong with time column. This is trading data, so the time should be from 7:00 to 16:30. It seems python misread the time format.
However,after I sorted the data by date. The data was like
name date ....
vod 01-Nov-13
vod 01-Dec-13
vod 02-Nov-13
vod 02-Dec-13
vod 03-Nov-13
vod 03-Dec-13
again, some wrong with date format. The time shouldn't be in this order...
I just have one question how to define/format date and time in python(pandas), so that I can sort data in order.
Upvotes: 1
Views: 1052
Reputation: 187
So you have two question.
First, time column is wrong. By the comment you said that the csv file is exactly what you post. Which means the data you get are something like 00:22.9
, but you think it's wrong. Since we don't have a corresponding data that could map to the actually time, so you might need to check that if the time 00:22.9
is elapsed time or whatever(00:22.9
might represent 07:22.9
?).
Second, the date string sort problem. Here I transform all the data into a dict list as below: (the data is built by myself)
data =
[{'date': '01-Nov-13', 'name': 'vod', 'price': '98', 'time': '00:23.0'},
{'date': '01-Jan-13', 'name': 'vod', 'price': '91', 'time': '00:23.0'},
{'date': '02-Nov-13', 'name': 'vod', 'price': '96', 'time': '00:22.0'},
{'date': '01-Dec-13', 'name': 'vod', 'price': '101', 'time': '00:23.0'}]
then with the sorted
function and datetime
module (remember the sorted
function change the data
it self)
from datetime import datetime
sorted(data, key=lambda each_dict: datetime.strptime(each_dict['date'], '%d-%b-%y'))
the %d-%b-%y
is the mask corresponding to 01-Nov-13
(day-month-year), check here for other masks.
Edit: if you just want to parse the date data, just simply call:
datetime.strptime(<date_input>, '%d-%b-%y')
this will return a datetime object like:
datetime.datetime(2013, 11, 1, 0, 0)
The whole sorted data is:
[{'date': '01-Jan-13', 'name': 'vod', 'price': '91', 'time': '00:23.0'},
{'date': '01-Nov-13', 'name': 'vod', 'price': '98', 'time': '00:23.0'},
{'date': '02-Nov-13', 'name': 'vod', 'price': '96', 'time': '00:22.0'},
{'date': '01-Dec-13', 'name': 'vod', 'price': '101', 'time': '00:23.0'}]
The only problem is, it might take some time/memory if you got lots of data set to sort
Upvotes: 1
Reputation: 1718
Ok, while we don't know how your time column in the csv looks like, I'll assume it is in hh:mm:ss format:
name, date, time, price
vod, 01-Nov-13, 07:40:00, 99
vod, 01-Nov-13, 07:34:50, 95
Have a look at the following minimal example. It uses the proposed csv input, combines the date and time column into one datetime object. With the column as dtype datetime64[ns], pandas can sort the data easily.
#!/usr/bin/python
import pandas as pd
# just write a short example.csv to test this.
with open('example.csv', 'w') as f:
f.write('name, date, time, price\n')
f.write('vod, 01-Nov-13, 07:40:00, 99\nvod, 01-Nov-13, 07:34:50, 95')
# read the csv and use the parse_dates option of pd.read_csv to
# format and combine the date and time columns
pdf = pd.read_csv('example.csv', parse_dates=[[1, 2]])
# for conveniency: rename the column to datetime
pdf.rename(columns={' date_ time': 'datetime'}, inplace=True)
# Sort the dataframe according to the datetime column
pdf.sort(columns='datetime', inplace=True)
For more details on the options of parse_dates have a look at the doc: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
If the time format in your csv is something parse_dates can't handle, you could use the converters
option in pd.read_csv
- but for this we need to know how your raw data actually looks like.
Upvotes: 1
Reputation: 926
You may use
from Python datetime module datetime object
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
Upvotes: 1