Reputation: 121
I have a data frame representing the schedule of some restaurants in a week.
week_hours
to my initial Dataframe df
that represents the total number of hours the restaurant is open per week.The Data (generated with df.head(20).to_dict('split')
)
The days of the weeks in this example are in French
{'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
nan,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
nan,
nan],
['vMh4madPU3qhNX7P7d8WGA', nan, nan, nan, nan, nan, nan, nan],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
nan],
['Xb7jOAa17xtT_uA4sCCAsg', nan, nan, nan, nan, nan, nan, nan],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
nan],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', nan, nan, nan, nan, nan, nan, nan],
['LAswzVTnT3uCvnKr-SwxEg', nan, nan, nan, nan, nan, nan, nan],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
what is the syntax allowing me to extract the number of hours from each column (representing days in this case) to calculate the number of business hours in the week inside a new column?
If any clarifications or simpler example needed, please ask.
Edit - tried a solution listed below but the results does not make sens (in the first line for example?)
Upvotes: 0
Views: 382
Reputation: 1272
I think this should do your stuff. It saves the new data as float (in hours). With datetime
(https://docs.python.org/3/library/datetime.html) you can easily calulate times, if you want to.
This applays the function calculate_hours
to the 7 given columns (days):
def calculate_hours(row: pd.Series) -> float:
try:
# split the given times to start and end time
opening_time, closing_time = row.split("-")
# split hours and minutes
start_hour, start_minute = opening_time.split(":")
end_hour, end_minute = closing_time.split(":")
# calculate start time (in hours)
start_time = float(start_hour) + float(start_minute) / 60
# calculate end time (in hours)
end_time = float(end_hour) + float(end_minute) / 60
# handle overneight and 24h openings
if start_time >= end_time:
end_time += 24
# return the duration from start time to end time
return end_time - start_time
# bare except are not recommended, you should look for your data, what could go wrong
except:
return 0.0
# Save the given data to the new column "open"
# sums up values for each day
df["open"] = df["lundi"].apply(calculate_hours) +\
df["mardi"].apply(calculate_hours) +\
df["mercredi"].apply(calculate_hours) +\
df["jeudi"].apply(calculate_hours) +\
df["vendredi"].apply(calculate_hours) +\
df["samedi"].apply(calculate_hours) +\
df["dimanche"].apply(calculate_hours)
PS: I used this to "import" your data, it does not look perfect, but i didn't know how to use your data better:
import pandas as pd
import datetime
all = {'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
None,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
None,
None],
['vMh4madPU3qhNX7P7d8WGA', None, None, None, None, None, None, None],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
None],
['Xb7jOAa17xtT_uA4sCCAsg', None, None, None, None, None, None, None],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
None],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', None, None, None, None, None, None, None],
['LAswzVTnT3uCvnKr-SwxEg', None, None, None, None, None, None, None],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
data = all["data"]
df = pd.DataFrame(data)
df.columns = all["columns"]
Upvotes: 1