DJW001
DJW001

Reputation: 153

Generating conditional timedeltas in Python across multiple rows and columns

I am working with weather data and am trying to calculate the number of daylight mins which correspond to the hourly observations within my timeseries.

London = pd.read_csv(root_dir + 'London.csv',
                     usecols=['date_time','London_sunrise','London_sunset'], 
                     parse_dates=['date_time'])

London.set_index(London['date_time'], inplace =True)

London['London_sunrise'] = pd.to_datetime(London['London_sunrise']).dt.strftime('%H:%M')
London['London_sunset'] = pd.to_datetime(London['London_sunset']).dt.strftime('%H:%M')
London['time'] = pd.to_datetime(London['date_time']).dt.strftime('%H:%M')

London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], '60', '0')

London.head(6)

Dataframe:


date_time               time            London_sunrise  London_sunset   London_sun_mins     
2019-05-21 00:00:00     00:00           05:01           20:54           0
2019-05-21 01:00:00     01:00           05:01           20:54           0
2019-05-21 02:00:00     02:00           05:01           20:54           0
2019-05-21 03:00:00     03:00           05:01           20:54           0
2019-05-21 04:00:00     04:00           05:01           20:54           0
2019-05-21 05:00:00     05:00           05:01           20:54           0
2019-05-21 06:00:00     06:00           05:01           20:54           60

I have tried conditional arguments to generate the number of sunlight mins per hour, ie) 60 if a full sunlight hour, 0 if night.

When I try to use a timedelta to generate the difference between sunrise and time ie) 05:00 and 05:01, the anticipated output isn't returned (59).

A simple: London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], '60', '0')

Gets close to the required output, however, when I try to extend to:

London['London_sun_mins'] = np.where(London['time']>=London['London_sunrise'], London['time'] - London['London_sunrise'], '0')

The following error is returned: unsupported operand type(s) for -: 'str' and 'str'

Also, when extending to encompass both sunrise and sunset:

London['sunlightmins'] = London[(London['London_sunrise'] >= London['date_time'] & London['London_sunset'] <= London['date_time'])]
London['London_sun_mins'] = np.where(np.logical_and(np.greater_equal(London['time'],London['London_sunrise']),np.less_equal(London['time'],London['London_sunset'])))

The same error is returned. All help in reaching the anticipated output is appreciated!

Upvotes: 0

Views: 92

Answers (1)

Raphaele Adjerad
Raphaele Adjerad

Reputation: 1125

I suggest staying with datetime types so that you can use the difference directly. Indeed, you have converted the hours to strings, and therefore when you try to substract them it gives you this error. But if you have datetime variables, you can substract them directly as follows:

# First I reproduce you dataset 
import pandas as pd
London = pd.DataFrame({"date_time": pd.date_range("2019-05-21", periods=7, freq = "H"),
                   "London_sunrise" : "05:01",
                   "London_sunset" : "20:54"})
# I extract the date from date_time
London["date"] = London["date_time"].dt.date
# Then I create a datetime variable for sunrise and sunset with the same date 
# as my date_time variable and the hour from London_sunset and London_sunrise
London["sunrise_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunrise"] + ":00", 1)
London["sunset_dtime"] = London.apply(lambda r: str(r["date"]) + " " + \
                                    r["London_sunset"] + ":00", 1)
# I transform them to datetime
London['sunrise_dtime'] = pd.to_datetime(London['sunrise_dtime'])
London['sunset_dtime'] = pd.to_datetime(London['sunset_dtime'])

# Then I can substract the two datetimes:
London['London_sun_mins'] = np.where(London['date_time']>=London['sunrise_dtime'],
                                     London['date_time'] - London['sunrise_dtime'], 0)

Here is the result:

           date_time London_sunrise  ...        sunset_dtime London_sun_mins
0 2019-05-21 00:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
1 2019-05-21 01:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
2 2019-05-21 02:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
3 2019-05-21 03:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
4 2019-05-21 04:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
5 2019-05-21 05:00:00          05:01  ... 2019-05-21 20:54:00        00:00:00
6 2019-05-21 06:00:00          05:01  ... 2019-05-21 20:54:00        00:59:00

Hope it helps

Upvotes: 2

Related Questions