Reputation: 972
I have a file where each row has this format:
YYYY-MM-DD-HH-MM-SS uint64 float64 float64 uint64
I've read it with:
pd.read_csv('file.txt', sep=' ', header=None, index_col=0, names= ('C1', 'C2', 'C3', 'C4'), use_unsigned=True, parse_dates=True, infer_datetime_format=True)
The datetimes constructed are not correct. Can I specify the exact format?
Upvotes: 10
Views: 24544
Reputation: 1217
New shorter solution
Since Pandas 2.0.0 there is a direct way to import dates with specific formats using parse_dates to specify the date-columns and date_format to specify the format.
Example csv
Based on your import I created the example file test.csv with the following content:
datetime float_col int_col
2023-09-14-15-00-00 13.2 7
2023-09-14-15-12-03 13.4 8
Direct one-line-solution - Import statement:
df = pd.read_csv('test.csv', sep=' ', parse_dates = [0], date_format = '%Y-%m-%d-%H-%M-%S')
Import result:
Resulting dtypes:
datetime datetime64[ns]
float_col float64
int_col int64
Explanation
parse_dates is a list of column positions with dates. Since the date column is at the 1st position. It is the position 0. For date_format the usual Python format definitions are used.
Upvotes: 5
Reputation: 139172
You can pass a function that parses the correct format to the date_parser
kwarg of read_csv
, but another option is to not parse the dates when reading, but afterwards with to_datetime
(this functions allows to specify a format, and will be faster than a custom date_parser
function):
df = pd.read_csv('file.txt', sep=' ', header=None, index_col=0, names= ('C1', 'C2', 'C3', 'C4'), use_unsigned=True)
df.index = pd.to_datetime(df.index, format="%Y-%m-%d-%H-%M-%S")
Upvotes: 18
Reputation: 972
I have found this method.
f = lambda s: datetime.datetime.strptime(s,'%Y-%m-%d-%H-%M-%S')
pd.read_csv('file.txt', sep=' ', header=None, index_col=0, names= ('C1', 'C2', 'C3', 'C4'), use_unsigned=True, date_parser=f)
that worked
Upvotes: 8