Reputation: 79
I'm new to Python, I hope my question isn't to silly... I want to join to pandas DataFrame (f1 and f3) and it seems that the indices are different.
f1:
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
'2018-01-09', '2018-01-10',
...
'2018-12-22', '2018-12-23', '2018-12-24', '2018-12-25',
'2018-12-26', '2018-12-27', '2018-12-28', '2018-12-29',
'2018-12-30', '2018-12-31'],
dtype='datetime64[ns]', name='date', length=365, freq=None)
f3:
Index([2018-01-01, 2018-01-02, 2018-01-07, 2018-03-30, 2018-04-01, 2018-04-02,
2018-05-01, 2018-05-10, 2018-05-20, 2018-05-21, 2018-06-04, 2018-08-01,
2018-12-25, 2018-12-26],
dtype='object')
Now if I join them in order cat = [f1, f3] with
cat_total = pd.concat(cat, axis=1, sort=False)
it seems to work and the correct result looks like this:
print(cat.head())
weekday holidays
2018-01-01 0 Neujahrestag
2018-01-02 1 Berchtoldstag
2018-01-03 2 NaN
2018-01-04 3 NaN
2018-01-05 4 NaN
If I change to order of cat like cat = [f3, f1] it doesn't work properly...
print(cat)
holidays weekday
2018-01-01 Neujahrestag 0
2018-01-02 Berchtoldstag 1
2018-01-07 Test ZH 1 6
2018-03-30 Karfreitag 4
2018-04-01 Ostern 6
2018-04-02 Ostermontag 0
2018-05-01 Tag der Arbeit 1
2018-05-10 Auffahrt 3
2018-05-20 Pfingsten 6
2018-05-21 Pfingstmontag 0
2018-06-04 Test ZH 2 0
2018-08-01 Nationalfeiertag 2
2018-12-25 Weihnachten 1
2018-12-26 Stephanstag 2
2018-01-01 00:00:00 NaN 0
2018-01-02 00:00:00 NaN 1
2018-01-03 00:00:00 NaN 2
2018-01-04 00:00:00 NaN 3
2018-01-05 00:00:00 NaN 4
2018-01-06 00:00:00 NaN 5
2018-01-07 00:00:00 NaN 6
Why is that like this? How can I change one of the indices of the pandas DataFrame that the formats are the same?
The f1-index arises from dates = pd.date_range(start = startdate, end = enddate, freq = 'D')
and the f3-one is the result of the external package 'holidays'
I hope these are all infos needed. Thanks a lot in advance
Marco
Upvotes: 0
Views: 297
Reputation: 2211
you can change the to_datetime
to format the column like so:
I assume the column is named DATE
cat_total['DATE'] = pd.to_datetime(cat_total['DATE'],format='%Y-%m-%d', errors='ignore')
Upvotes: 1