iraciv94
iraciv94

Reputation: 840

From Dataframe to Datestamp python3

recently I faced a really weird csv file with 2 columns (with headers), one for dates and the second one for prices. The time format was "dd.mm.yyyy".

d = {'Date': [31.12.1991, 02.01.1992, 03.01.1992, 06.01.1992], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)
prices = pd.DataFrame(df['Prices'])
date = pd.DataFrame(df['Date'])
date = date.to_string(header=True)
date = df.to_datetime(utc=True, infer_datetime_format=True)
frame = date.join(values)
print(df)

I tried to make it work by isolating the date column and trying to transform it first into string with the to_string() function and then back to date with the to_datetime but it was no use.

Any suggestions? Thanks in advance

Upvotes: 1

Views: 48

Answers (3)

piRSquared
piRSquared

Reputation: 294218

Interesting way to generalize for whole dataframe

Note This uses errors='ignore' in order to skip columns that might not be suitable for parsing as dates. However, the trade off is that if there is a column that is intended to be parsed as dates but has a bad date value, this approach will leave that column unaltered. The point is to make sure you don't have bad date values.

df.assign(
    **df.select_dtypes(exclude=[np.number]).apply(
        pd.to_datetime, errors='ignore', dayfirst=True
    )
)

        Date  Prices
0 1991-12-31    9.62
1 1992-01-02    9.50
2 1992-01-03    9.73
3 1992-01-06    9.45

Another example

df = pd.DataFrame(dict(
    A=1, B='B', C='6.7.2018', D=1-1j,
    E='1.2.2017', F=pd.Timestamp('2016-08-08')
), [0])

df

   A  B         C       D         E          F
0  1  B  6.7.2018  (1-1j)  1.2.2017 2016-08-08

df.assign(
    **df.select_dtypes(exclude=[np.number]).apply(
        pd.to_datetime, errors='ignore', dayfirst=True
    )
)

   A  B          C       D          E          F
0  1  B 2018-07-06  (1-1j) 2017-02-01 2016-08-08

Setup

borrowed from jezrael

d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)

Upvotes: 5

ALollz
ALollz

Reputation: 59519

You could try to parse the dates when you read in the file. You can specify that the format has the day first instead of the month.

import pandas as pd
df = pd.read_csv('test.csv', parse_dates=['Date'], dayfirst=True)

print(df)
#        Date   Prices
#0 1991-12-31     9.62
#1 1992-01-02     9.50
#2 1992-01-03     9.73
#3 1992-01-06     9.45

df.dtypes
#Date      datetime64[ns]
#Prices           float64
#dtype: object

However, your data really need to be clean and properly formatted for this to work:

parse_dates:

  • If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv

Sample Data: test.csv

Date,Prices
31.12.1991,9.62
02.01.1992,9.5
03.01.1992,9.73
06.01.1992,9.45

Upvotes: 3

jezrael
jezrael

Reputation: 862406

I believe need:

d = {'Date': ['31.12.1991', '02.01.1992', '03.01.1992', '06.01.1992'], 
     'Prices': [9.62, 9.5, 9.73, 9.45]}
df = pd.DataFrame(data=d)

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
        Date  Prices
0 1991-12-31    9.62
1 1992-01-02    9.50
2 1992-01-03    9.73
3 1992-01-06    9.45

Upvotes: 3

Related Questions