Reputation: 3244
Now I have a .csv file, with a column of time, such that "20140203 00:00:03.132", how can I drop the seconds part(":03.132") efficiently? The data amount is huge, and I tried preprocess the data using sed but it was too slow!
I am now trying parse the .csv file in pandas. Is there anyway I could handle that efficiently? Methods other than pandas is also welcomed!
Upvotes: 0
Views: 1058
Reputation: 87124
Take a look that the date_parser
parameter to pandas.read_csv()
. Something along the lines of this should work:
import dateutil
from pandas import read_csv
def my_date_parser(seq):
return [dateutil.parser.parse(s[:14]) for s in seq]
csv = read_csv('file.csv', parse_dates=[3], date_parser=my_date_parser)
You will probably need to also supply parameter parse_dates
to nail down the column(s) containing the date strings, e.g. above specifies column 3 as a date column.
Upvotes: 1
Reputation: 6491
There is a handy library for parsing timestamps: datetime:
import datetime
x = '20140203 00:00:03.132'
timestamp = datetime.datetime.strptime(x, '%Y%m%d %H:%M:%S.%f')
print datetime.datetime.strftime(timestamp, '%Y%m%d %H:%M') # 20140203 00:00
Or since it's a bit slow for a huge amount of data, you can split from the right on the first :
and then take the first element of the resulting list:
print x.rsplit(':', 1)[0] # 20140203 00:00
Upvotes: 1