PointXIV
PointXIV

Reputation: 1288

Adding '-' into string in dataframe

I have a dataframe with dates in it, but the way the dates come formatted is YYYYMMDD, without any divider.

I tried using a line like this:

df['date'] = [(x[:4] + '-' + x[4:6] + '-' + x[6]) for x in df['date']]

but I keep getting IndexError: invalid index to scalar variable.

From what I've found out, that's because I'm trying to do the operation on a series in a dataframe. However, I can't seem to find another way to add in any kind of separator between the parts of the date. (Using time.striptime just gave me another error because of the lack of separator)

Is there another way to turn YYYYMMDD into YYYY-MM-DD as well as into a datetime object on an entire series in a dataframe?

EDIT:

Solved by converting the dates from a numpy.float64 into string using

df['date'] = [str(x) for x in df['date']]
df['date'] = df['date'].str.split('.').str[0]

and then using the operation at the top, and then using strptime to convert to a datetime object.

Upvotes: 0

Views: 65

Answers (2)

unutbu
unutbu

Reputation: 879083

It's possible that you are getting the IndexError because the date column has dtype float64:

In [138]: df = pd.DataFrame({'date': np.array([20000101, 20000102])})

In [139]: df['date'] = [(x[:4] + '-' + x[4:6] + '-' + x[6]) for x in df['date']]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-139-f88e4248d599> in <module>()
----> 1 df['date'] = [(x[:4] + '-' + x[4:6] + '-' + x[6]) for x in df['date']]

IndexError: invalid index to scalar variable.

One way to address the problem would be to convert the floats to strings, and then convert the strings to pandas.Timestamps:

In [136]: df['date'].astype('|S8').map(pd.Timestamp)
Out[136]: 
0   2000-01-01
1   2000-01-02
Name: date, dtype: datetime64[ns]

Upvotes: 1

Alexander Zh
Alexander Zh

Reputation: 111

Try this snippet.

>>> from datetime import datetime
>>> datetime.strptime('20140101', '%Y%m%d')
datetime.datetime(2014, 1, 1, 0, 0)
>>> date = datetime.strptime('20140101', '%Y%m%d')
>>> date.strftime('%Y-%m-%d')
'2014-01-01'
>>> 

Upvotes: 0

Related Questions