Reputation: 840
recently I faced a really weird csv file with 2 columns (with headers), one for dates and the second one for prices. The time format was "dd.mm.yyyy".
d = {'Date': [31.12.1991, 02.01.1992, 03.01.1992, 06.01.1992],
'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
prices = pd.DataFrame(df['Prices'])
date = pd.DataFrame(df['Date'])
date = date.to_string(header=True)
date = df.to_datetime(utc=True, infer_datetime_format=True)
frame = date.join(values)
print(df)
I tried to make it work by isolating the date column and trying to transform it first into string with the to_string() function and then back to date with the to_datetime but it was no use.
Any suggestions? Thanks in advance
Upvotes: 1
Views: 48
Reputation: 294218
Note This uses errors='ignore'
in order to skip columns that might not be suitable for parsing as dates. However, the trade off is that if there is a column that is intended to be parsed as dates but has a bad date value, this approach will leave that column unaltered. The point is to make sure you don't have bad date values.
df.assign(
**df.select_dtypes(exclude=[np.number]).apply(
pd.to_datetime, errors='ignore', dayfirst=True
)
)
Date Prices
0 1991-12-31 9.62
1 1992-01-02 9.50
2 1992-01-03 9.73
3 1992-01-06 9.45
Another example
df = pd.DataFrame(dict(
A=1, B='B', C='6.7.2018', D=1-1j,
E='1.2.2017', F=pd.Timestamp('2016-08-08')
), [0])
df
A B C D E F
0 1 B 6.7.2018 (1-1j) 1.2.2017 2016-08-08
df.assign(
**df.select_dtypes(exclude=[np.number]).apply(
pd.to_datetime, errors='ignore', dayfirst=True
)
)
A B C D E F
0 1 B 2018-07-06 (1-1j) 2017-02-01 2016-08-08
borrowed from jezrael
d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'],
'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
Upvotes: 5
Reputation: 59519
You could try to parse the dates when you read in the file. You can specify that the format has the day first instead of the month.
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=['Date'], dayfirst=True)
print(df)
# Date Prices
#0 1991-12-31 9.62
#1 1992-01-02 9.50
#2 1992-01-03 9.73
#3 1992-01-06 9.45
df.dtypes
#Date datetime64[ns]
#Prices float64
#dtype: object
However, your data really need to be clean and properly formatted for this to work:
parse_dates:
- If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv
Sample Data: test.csv
Date,Prices
31.12.1991,9.62
02.01.1992,9.5
03.01.1992,9.73
06.01.1992,9.45
Upvotes: 3
Reputation: 862406
I believe need:
d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'],
'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
Date Prices
0 1991-12-31 9.62
1 1992-01-02 9.50
2 1992-01-03 9.73
3 1992-01-06 9.45
Upvotes: 3