Reputation: 77
I have a csv file showing daily sales in months. Problem is, I want to calculate the sales made in a given time interval. Sample of my first nine (9):
sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00',
'2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
'2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })
All I want to do is get the sum of sales made in the following time periods: '00:00:00' - '04:59:00' '05:00:00' - '07:59:00' '08:00:00' - '10:00:00' '11:00:00' - '23:59:00'
I tried using Pandas dataframe.between_time()
but then the results are no desirable.
a = sales.between_time('00:00:00', '04:59:00')['Amount'].sum
b = sales.between_time('05:00:00', '07:59:00')['Amount'].sum
c = sales.between_time('08:00:00', '10:00:00')['Amount'].sum
d = sales.between_time('11:00:00', '23:59:00')['Amount'].sum
Is there a Pythonic way of summing the sales made in a given time intervals of my problem?
Upvotes: 2
Views: 470
Reputation: 2917
I make a column of Time only. Check this:
sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00',
'2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
'2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })
sales["Time"] = pd.to_datetime(sales["Date"]).dt.time
a = sales[(sales["Time"] > pd.to_datetime("00:00:00").time()) & (sales["Time"] < pd.to_datetime("04:59:00").time())]["Amount"].sum()
print(a)
>>> 33140
Upvotes: 0
Reputation: 686
You're pretty much there, just change the Date column to a timestamp rather than a string, set it as the index and you're good to go
sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00',
'2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
'2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })
sales['Date'] = sales['Date'].apply(lambda x: pd.to_datetime(x), 1)
sales = sales.set_index('Date')
Now we can use between_time
a = sales.between_time('00:00:00', '04:59:00')['Amount'].sum()
print(a) <- 33140
There are more efficient ways to get a timestamp but the above should work
Upvotes: 3
Reputation: 394
You don't have the corresponding value to amount at '04:59:00'
so it is is not possible to do so. If there is data available at these intervals then it would be possible
Upvotes: 0