xxx222
xxx222

Reputation: 3244

How to parse CSV file using pandas?

Now I have a .csv file, with a column of time, such that "20140203 00:00:03.132", how can I drop the seconds part(":03.132") efficiently? The data amount is huge, and I tried preprocess the data using sed but it was too slow!

I am now trying parse the .csv file in pandas. Is there anyway I could handle that efficiently? Methods other than pandas is also welcomed!

Upvotes: 0

Views: 1058

Answers (2)

mhawke
mhawke

Reputation: 87124

Take a look that the date_parser parameter to pandas.read_csv(). Something along the lines of this should work:

import dateutil
from pandas import read_csv

def my_date_parser(seq):
    return [dateutil.parser.parse(s[:14]) for s in seq]

csv = read_csv('file.csv', parse_dates=[3], date_parser=my_date_parser)

You will probably need to also supply parameter parse_dates to nail down the column(s) containing the date strings, e.g. above specifies column 3 as a date column.

Upvotes: 1

olofom
olofom

Reputation: 6491

There is a handy library for parsing timestamps: datetime:

import datetime
x = '20140203 00:00:03.132'
timestamp = datetime.datetime.strptime(x, '%Y%m%d %H:%M:%S.%f')
print datetime.datetime.strftime(timestamp, '%Y%m%d %H:%M')  # 20140203 00:00

Or since it's a bit slow for a huge amount of data, you can split from the right on the first : and then take the first element of the resulting list:

print x.rsplit(':', 1)[0]  # 20140203 00:00

Upvotes: 1

Related Questions