Reputation: 117
I have the issue that I am combining two data sets and they have different definitions of midnight. I therefore want to add a day to one of the dataset's for every occurrence of midnight so that they both follow the same date setup.
I have structured my date and time by:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%d/%m/%Y')
df['Hour'] = pd.to_datetime(df['Hour']).dt.strftime('%H:%M')
And then I am trying to amend any occurrence of 00:00:00 so that it is one day later:
df.loc['Hour' == '00:00:00', 'Date'] = pd.DatetimeIndex(df.Date) + timedelta(days=1)
However i keep raising a KeyError:
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem
Any help would be greatly appreciated.
Aim: Input:
Date | Hour
---------------------
19/06/2016 | 23:30
19/06/2016 | 23:45
19/06/2016 | 00:00
20/06/2016 | 00:15
20/06/2016 | 00:30
Output:
Date | Hour
---------------------
19/06/2016 | 23:30
19/06/2016 | 23:45
20/06/2016 | 00:00
20/06/2016 | 00:15
20/06/2016 | 00:30
Upvotes: 1
Views: 191
Reputation: 862431
You can use mask
for check midnight and add one day:
f['Date'] = pd.to_datetime(f['Date'])
m = f['Hour'] == '00:00'
f['Date'] = f['Date'].mask(m, f['Date'] + pd.Timedelta(1, unit='d')).dt.strftime('%d/%m/%Y')
Solution with loc
:
m = f['Hour'] == '00:00'
dates = pd.to_datetime(f['Date'])
f.loc[m, 'Date'] = (dates + pd.Timedelta(1, unit='d')).dt.strftime('%d/%m/%Y')
#alternative
#f.loc[m, 'Date'] = (dates[m] + pd.Timedelta(1, unit='d')).dt.strftime('%d/%m/%Y')
Solution with numpy.where
:
m = f['Hour'] == '00:00'
dates = (pd.to_datetime(f['Date']) + pd.Timedelta(1, unit='d')).dt.strftime('%d/%m/%Y')
f['Date'] = np.where(m, dates, f['Date'])
print (f)
Date Hour
0 19/06/2016 23:30
1 19/06/2016 23:45
2 20/06/2016 00:00
3 20/06/2016 00:15
4 20/06/2016 00:30
Within Code:
for fname in glob.glob(path):
fname = fname.replace(r'\2016', '/2016')
f = pd.DataFrame(pd.read_csv(fname))
f = f.replace({'Hour': {'24:00:00': '00:00'}})
f['Date'] = pd.to_datetime(f['Date']).dt.strftime('%d/%m/%Y')
f['Hour'] = pd.to_datetime(f['Hour']).dt.strftime('%H:%M')
m = f['Hour'] == '00:00'
dates = (pd.to_datetime(f['Date']) + pd.Timedelta(1, unit='d')).dt.strftime('%d/%m/%Y')
f['Date'] = np.where(m, dates, f['Date'])
print(fname)
if a == 0:
f_2016['Date'] = f['Date']
f_2016['Hour'] = f['Hour']
a = 1
f_2016 = pd.merge(f_2016, f, on=['Date', 'Hour'])
print(pd.DataFrame.head(f_2016, n=100))
Upvotes: 1
Reputation: 164613
My advice is not to separate Date & Time unless you must.
You can test whether your time is midnight by testing your datettime
column to a normalized version:
import pandas as pd
f = pd.DataFrame({'Date': ['2018/01/01 15:00', '2018/01/02 00:00']})
f['Date'] = pd.to_datetime(f['Date'])
f.loc[f['Date'] == f['Date'].dt.normalize()] = f['Date'].apply(pd.DateOffset(1))
# Date
# 0 2018-01-01 15:00:00
# 1 2018-01-03 00:00:00
You can adapt this solution if you really must keep time separate:
f = pd.DataFrame({'Date': ['2018/01/01', '2018/01/02'],
'Hour': ['15:00', '00:00']})
f['Date'] = pd.to_datetime(f['Date'])
mask = pd.to_datetime(f['Date'].astype(str)+' '+f['Hour']) == f['Date']
f.loc[mask, 'Date'] = f.loc[mask, 'Date'].apply(pd.DateOffset(1))
Upvotes: 1