Reputation: 153
I am working with weather data and am trying to calculate the number of daylight mins which correspond to the hourly observations within my timeseries.
London = pd.read_csv(root_dir + 'London.csv',
usecols=['date_time','London_sunrise','London_sunset'],
parse_dates=['date_time'])
London.set_index(London['date_time'], inplace =True)
London['London_sunrise'] = pd.to_datetime(London['London_sunrise']).dt.strftime('%H:%M')
London['London_sunset'] = pd.to_datetime(London['London_sunset']).dt.strftime('%H:%M')
London['time'] = pd.to_datetime(London['date_time']).dt.strftime('%H:%M')
London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], '60', '0')
London.head(6)
Dataframe:
date_time time London_sunrise London_sunset London_sun_mins
2019-05-21 00:00:00 00:00 05:01 20:54 0
2019-05-21 01:00:00 01:00 05:01 20:54 0
2019-05-21 02:00:00 02:00 05:01 20:54 0
2019-05-21 03:00:00 03:00 05:01 20:54 0
2019-05-21 04:00:00 04:00 05:01 20:54 0
2019-05-21 05:00:00 05:00 05:01 20:54 0
2019-05-21 06:00:00 06:00 05:01 20:54 60
I have tried conditional arguments to generate the number of sunlight mins per hour, ie) 60 if a full sunlight hour, 0 if night.
When I try to use a timedelta to generate the difference between sunrise and time ie) 05:00 and 05:01, the anticipated output isn't returned (59).
A simple:
London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], '60', '0')
Gets close to the required output, however, when I try to extend to:
London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], London['time'] - London['London_sunrise'], '0')
The following error is returned:
unsupported operand type(s) for -: 'str' and 'str'
Also, when extending to encompass both sunrise and sunset:
London['sunlightmins'] = London[(London['London_sunrise'] >= London['date_time'] & London['London_sunset'] <= London['date_time'])]
London['London_sun_mins'] = np.where(np.logical_and(np.greater_equal(London['time'],London['London_sunrise']),np.less_equal(London['time'],London['London_sunset'])))
The same error is returned. All help in reaching the anticipated output is appreciated!
Upvotes: 0
Views: 92
Reputation: 1125
I suggest staying with datetime types so that you can use the difference directly. Indeed, you have converted the hours to strings, and therefore when you try to substract them it gives you this error. But if you have datetime variables, you can substract them directly as follows:
# First I reproduce you dataset
import pandas as pd
London = pd.DataFrame({"date_time": pd.date_range("2019-05-21", periods=7, freq = "H"),
"London_sunrise" : "05:01",
"London_sunset" : "20:54"})
# I extract the date from date_time
London["date"] = London["date_time"].dt.date
# Then I create a datetime variable for sunrise and sunset with the same date
# as my date_time variable and the hour from London_sunset and London_sunrise
London["sunrise_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
r["London_sunrise"] + ":00", 1)
London["sunset_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
r["London_sunset"] + ":00", 1)
# I transform them to datetime
London['sunrise_dtime'] = pd.to_datetime(London['sunrise_dtime'])
London['sunset_dtime'] = pd.to_datetime(London['sunset_dtime'])
# Then I can substract the two datetimes:
London['London_sun_mins'] = np.where(London['date_time']>=London['sunrise_dtime'],
London['date_time'] - London['sunrise_dtime'], 0)
Here is the result:
date_time London_sunrise ... sunset_dtime London_sun_mins
0 2019-05-21 00:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
1 2019-05-21 01:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
2 2019-05-21 02:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
3 2019-05-21 03:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
4 2019-05-21 04:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
5 2019-05-21 05:00:00 05:01 ... 2019-05-21 20:54:00 00:00:00
6 2019-05-21 06:00:00 05:01 ... 2019-05-21 20:54:00 00:59:00
Hope it helps
Upvotes: 2