Reputation: 1366
I have data in a DF (df1) that starts and ends like this below and I'm trying to shift the "0" and "1" columns below so that the date and time is moved back one hour so that the date and time start at hour == 0 not hour == 1.
data starts (df1) -
0 1 2 3 4 5 6 7
0 20160101 100 7.977169 109404.0 20160101 100 4.028678 814.0
1 20160101 200 8.420204 128546.0 20160101 200 4.673662 2152.0
2 20160101 300 9.515370 165931.0 20160101 300 8.019863 8100.0
data ends (df1) -
0 1 2 3 4 5 6 7
8780 20161231 2100 4.198906 11371.0 20161231 2100 0.995571 131.0
8781 20161231 2200 4.787433 19083.0 20161231 2200 1.029809 NaN
8782 20161231 2300 3.987506 9354.0 20161231 2300 0.900942 NaN
8783 20170101 0 3.284947 1815.0 20170101 0 0.899262 NaN
I need the date and time to start shifted back one hour so start time is hour begin not hour end -
0 1 2 3 4 5 6 7
0 20160101 000 7.977169 109404.0 20160101 100 4.028678 814.0
1 20160101 100 8.420204 128546.0 20160101 200 4.673662 2152.0
2 20160101 200 9.515370 165931.0 20160101 300 8.019863 8100.0
and ends like this with the date and time below -
0 1 2 3 4 5 6 7
8780 20161231 2000 4.198906 11371.0 20161231 2100 0.995571 131.0
8781 20161231 2100 4.787433 19083.0 20161231 2200 1.029809 NaN
8782 20161231 2200 3.987506 9354.0 20161231 2300 0.900942 NaN
8783 20161231 2300 3.284947 1815.0 20170101 0 0.899262 NaN
And, i have no real idea of how to accomplish this or how to research it. Thank you,
Upvotes: 5
Views: 4028
Reputation: 23099
It would be better to create a proper datetime object then simply remove the hours as a sum which will handle any redaction in days. We can then use dt.strftime
to re-create your object (string) columns.
s = pd.to_datetime(
df[0].astype(str) + df[1].astype(str).str.zfill(4), format="%Y%m%d%H%M"
)
0 2016-01-01 01:00:00
1 2016-01-01 02:00:00
2 2016-01-01 03:00:00
8780 2016-12-31 21:00:00
8781 2016-12-31 22:00:00
8782 2016-12-31 23:00:00
8783 2017-01-01 00:00:00
dtype: datetime64[ns]
df[1] = (s - pd.DateOffset(hours=1)).dt.strftime("%H%M").str.lstrip("0").str.zfill(3)
df[0] = (s - pd.DateOffset(hours=1)).dt.strftime("%Y%d%m")
print(df)
0 1 2 3 4 5 6 7
0 20160101 000 7.977169 109404.0 20160101 100 4.028678 814.0
1 20160101 100 8.420204 128546.0 20160101 200 4.673662 2152.0
2 20160101 200 9.515370 165931.0 20160101 300 8.019863 8100.0
8780 20163112 2000 4.198906 11371.0 20161231 2100 0.995571 131.0
8781 20163112 2100 4.787433 19083.0 20161231 2200 1.029809 NaN
8782 20163112 2200 3.987506 9354.0 20161231 2300 0.900942 NaN
8783 20163112 2300 3.284947 1815.0 20170101 0 0.899262 NaN
Upvotes: 4
Reputation: 71687
Use, DataFrame.shift
to shift the columns 0
, 1
, then use Series.bfill
on column 0
of df2
to fill the missing values, then use .fillna
on column 1 of df2 to fill the NaN
values, finally use Dataframe.join
to join the dataframe df2
with the dataframe df1
:
df2 = df1[['0', '1']].shift()
df2['0'] = df2['0'].bfill()
df2['1'] = df2['1'].fillna('000')
df2 = df2.join(df1.loc[:, '2':])
# print(df2)
0 1 2 3 4 5 6 7
0 20160101 000 7.977169 109404.0 20160101 100 4.028678 814.0
1 20160101 100 8.420204 128546.0 20160101 200 4.673662 2152.0
2 20160101 200 9.515370 165931.0 20160101 300 8.019863 8100.0
...
8780 20160101 300 4.198906 11371.0 20161231 2100 0.995571 131.0
8781 20161231 2100 4.787433 19083.0 20161231 2200 1.029809 NaN
8782 20161231 2200 3.987506 9354.0 20161231 2300 0.900942 NaN
8783 20161231 2300 3.284947 1815.0 20170101 0 0.899262 NaN
Upvotes: 2
Reputation: 320
You can do subtraction in pandas (considering that the data in your dataframe are not string type)
I will show you an example on how it can be done
import pandas as pd
df = pd.DataFrame()
df['time'] = [0,100,500,2100,2300,0] #creating dataframe
df['time'] = df['time']-100 #This is what you want to do
Now your data will be subtracted by 100.
There is a case when subtracting 0 you will get -100 as time. For that you can do this:
for i in range(len(df['time'])):
if df['time'].iloc[i]== -100:
df['time'].iloc[i]=2300
Upvotes: -1