John Mutesi
John Mutesi

Reputation: 77

How to define hourly interval in python

I have a csv file showing daily sales in months. Problem is, I want to calculate the sales made in a given time interval. Sample of my first nine (9):

sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00', 
                                '2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
                                '2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
                       'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })

All I want to do is get the sum of sales made in the following time periods: '00:00:00' - '04:59:00' '05:00:00' - '07:59:00' '08:00:00' - '10:00:00' '11:00:00' - '23:59:00'

I tried using Pandas dataframe.between_time() but then the results are no desirable.

a = sales.between_time('00:00:00', '04:59:00')['Amount'].sum
b = sales.between_time('05:00:00', '07:59:00')['Amount'].sum
c = sales.between_time('08:00:00', '10:00:00')['Amount'].sum
d = sales.between_time('11:00:00', '23:59:00')['Amount'].sum

Is there a Pythonic way of summing the sales made in a given time intervals of my problem?

Upvotes: 2

Views: 470

Answers (3)

Ha Bom
Ha Bom

Reputation: 2917

I make a column of Time only. Check this:

sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00', 
                                '2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
                                '2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
                       'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })

sales["Time"] = pd.to_datetime(sales["Date"]).dt.time
a = sales[(sales["Time"] > pd.to_datetime("00:00:00").time()) & (sales["Time"] < pd.to_datetime("04:59:00").time())]["Amount"].sum()
print(a)

>>> 33140

Upvotes: 0

JimmyA
JimmyA

Reputation: 686

You're pretty much there, just change the Date column to a timestamp rather than a string, set it as the index and you're good to go

sales = pd.DataFrame({ 'Date': ['2020-02-01 00:13:00', '2020-02-01 02:10:00', '2020-02-01 03:03:00', 
    '2020-02-01 06:52:00', '2020-02-01 09:19:00', '2020-02-01 09:50:00',
    '2020-02-01 10:00:00', '2020-02-01 10:06:00', '2020-02-19 21:43:00'],
    'Amount': [2000, 26080, 5060, 800, 3000, 10934, 2250, 3600, 11528] })

sales['Date'] = sales['Date'].apply(lambda x: pd.to_datetime(x), 1) 
sales = sales.set_index('Date')

Now we can use between_time

a = sales.between_time('00:00:00', '04:59:00')['Amount'].sum()
print(a) <- 33140

There are more efficient ways to get a timestamp but the above should work

Upvotes: 3

Arpit
Arpit

Reputation: 394

You don't have the corresponding value to amount at '04:59:00' so it is is not possible to do so. If there is data available at these intervals then it would be possible

Upvotes: 0

Related Questions