PythonNoob
PythonNoob

Reputation: 21

Pandas - Adding together a value from numerous date/times

I read an excel file to get a data frame that looks like this;

Date    Lane    Lane Name   Direction   DirectionName   Speed (mph) Headway (s) Gap (s) Flags   Flag Text
0   2018-02-02 00:00:03.000 6   SB_NS   2   South   38.525  NaN NaN 5   Friday
1   2018-02-02 00:00:22.010 5   SB_MID  2   South   32.310  NaN NaN 5   Friday
2   2018-02-02 00:00:22.020 4   SB_OS   2   South   44.739  NaN NaN 5   Friday
3   2018-02-02 00:00:36.040 6   SB_NS   2   South   33.554  NaN NaN 5   Friday
4   2018-02-02 00:00:49.070 6   SB_NS   2   South   39.768  12.300  11.847  5   Friday
... ... ... ... ... ... ... ... ... ... ...
503763  2018-02-27 23:59:00.090 2   NB_MID  1   North   32.932  4.415   3.833   2   Tuesday
503764  2018-02-27 23:59:29.090 6   SB_NS   2   South   29.825  65.500  64.700  2   Tuesday
503765  2018-02-27 23:59:32.050 4   SB_OS   2   South   29.205  236.000 235.848 2   Tuesday
503766  2018-02-27 23:59:33.070 6   SB_NS   2   South   37.283  3.330   3.462   2   Tuesday
503767  2018-02-27 23:59:58.050 1   NB_NS   1   North   36.661  76.000  75.669  2   Tuesday
503768 rows × 10 columns

I use this code:

## ONLY SHOW 02-02-2018 and SOUTH and COUNT
df = df.loc[(df.DirectionName =="South")] 
df['Date'] = pd.to_datetime(df['Date'])
start_date = '2018-02-02'
end_date = '2018-02-03'
mask = (df['Date'] > start_date) & (df['Date'] <= end_date)
df = df.loc[mask]
df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')

to get this output:

Date    DirectionName   count
0   2018-02-02 00:00:00 South   250
1   2018-02-02 01:00:00 South   137
2   2018-02-02 02:00:00 South   92
3   2018-02-02 03:00:00 South   92
4   2018-02-02 04:00:00 South   110
5   2018-02-02 05:00:00 South   272
6   2018-02-02 06:00:00 South   832
7   2018-02-02 07:00:00 South   1972
8   2018-02-02 08:00:00 South   2106
9   2018-02-02 09:00:00 South   1695
10  2018-02-02 10:00:00 South   1493
11  2018-02-02 11:00:00 South   1716
12  2018-02-02 12:00:00 South   1970
13  2018-02-02 13:00:00 South   2081
14  2018-02-02 14:00:00 South   2363
15  2018-02-02 15:00:00 South   2583
16  2018-02-02 16:00:00 South   2746
17  2018-02-02 17:00:00 South   2647
18  2018-02-02 18:00:00 South   2107
19  2018-02-02 19:00:00 South   1521
20  2018-02-02 20:00:00 South   1047
21  2018-02-02 21:00:00 South   851
22  2018-02-02 22:00:00 South   813
23  2018-02-02 23:00:00 South   557

The count is volume of data within that time frame, for example a volume of 250 between 0:00 - 1:00 on the 02/02/2012.

I want to be able to generate this output for every friday of the month (02/02/2018 + 09/02/2018 + 16/02/2018 + 23/02/2018). and then Add up the total for each hour across the four dates.

However, I am having trouble even being able to generate this output on the same notebook for each date. I can do it once in the same notebook, no problem. I can do it in separate notebooks, no problem. However I can not get it to work all at once within the same note book.

I have tried changing the date to include all dates within the range...

## ONLY SHOW 02-02-2018 and SOUTH and then COUNT
df = df.loc[(df.DirectionName =="South")] 
df['Date'] = pd.to_datetime(df['Date'])
start_date = '2018-02-02' #CHANGED#
end_date = '2018-02-24' #CHANGED#
mask = (df['Date'] > start_date) & (df['Date'] <= end_date)
df = df.loc[mask]
df.groupby([pd.Grouper(key='Date',freq='H'),df.DirectionName]).size().reset_index(name='count')

But this will then obviously include dates which I don't want....

Date    DirectionName   count1
0   2018-02-02 00:00:00 South   250
1   2018-02-02 01:00:00 South   137
2   2018-02-02 02:00:00 South   92
3   2018-02-02 03:00:00 South   92
4   2018-02-02 04:00:00 South   110
... ... ... ...
163 2018-02-23 19:00:00 South   1490
164 2018-02-23 20:00:00 South   1056
165 2018-02-23 21:00:00 South   858
166 2018-02-23 22:00:00 South   783
167 2018-02-23 23:00:00 South   563

And even using this method, I am confused as to how I would ADD up all of the counts for the specific time+dates. I am doing this so that I would then be able to calculate the AVERAGE COUNT at 9am for Fridays during that month.

For example add together the count at 9am for 02/02/2018 + 09/02/2018 + 16/02/2018 + 23/02/2018.

I hope I have explained my problem clearly. I don't think my method is suitable for what I am trying to do. I would greatly appreciate some help / advice.

EDIT

When I update with using Quang Hoang's code, I now get this output below. Ideally, I still only want to output CERTAIN dates, which are : 02/02/2018, 09/02/2018, 16/02/2018, 23/02/2018

Date              Date  Date  DirectionName
2018-02-02 00:00  4     0     South             250
2018-02-02 01:00  4     1     South             137
2018-02-02 02:00  4     2     South              92
2018-02-02 03:00  4     3     South              92
2018-02-02 04:00  4     4     South             110
                                               ... 
2018-02-23 19:00  4     19    South            1490
2018-02-23 20:00  4     20    South            1056
2018-02-23 21:00  4     21    South             858
2018-02-23 22:00  4     22    South             783
2018-02-23 23:00  4     23    South             563
Length: 168, dtype: int64

Also, the date columns for this output are off, for example one of the columns should be hours not date

Upvotes: 0

Views: 39

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You are looking at groupby().size(). However, there are several things you want to groupby here:

1. The hours, e.g. `9am`
2. The days of the week, e.g `Friday`
3. The months, e.g. `2018-02`
4. Possibly `DirectionName`

Let try to extract them:

df['Date'] = pd.to_datetime(df['Date'])

hours = df['Date'].dt.hour
days = df['Date'].dt.dayofweek
months = df['Date'].dt.to_period('M')

# exclude `'DirectionName'` if you want
df.groupby([months, days, hours, 'DirectionName']).size()

Upvotes: 1

Related Questions