Aezak
Aezak

Reputation: 59

Converting datetime list to an hourly distribution dataframe

I have a list of datetime string values. I wanted to have them in hourly distribution from Hours 00 to 24.

For example, a sample list:

['2021-06-18 14:39:54', '2021-06-18 08:30:26', '2021-06-18 15:07:12', '2021-06-18 13:13:29', '2021-06-18 11:27:48', '2021-06-19 09:25:26', '2021-06-19 16:14:38', '2021-06-19 13:41:31', '2021-06-19 15:09:42', '2021-06-19 03:24:29']

and expected dataframe view:

         date  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0  2021-06-18  0  0  0  0  0  0  0  0  1  0  0  1  0  1  1  1  0  0  0  0  0  0  0  0  0
1  2021-06-19  0  0  0  1  0  0  0  0  0  1  0  0  0  1  0  1  1  0  0  0  0  0  0  0  0

I converted the values to datetime by :

time = datetime.strptime(item, '%Y-%m-%d %H:%M:%S')

and got the hours

hr = time.hour

In can get the result using if else , but its too naive. Is their a way to do this elegantly.

TIA.

Upvotes: 0

Views: 680

Answers (2)

Ali Irani
Ali Irani

Reputation: 607

You can do this with dataframe.loc. First get indices of dataframe and create it:

a = ['2021-06-18 14:39:54', '2021-06-18 08:30:26', '2021-06-18 15:07:12', '2021-06-18 13:13:29', '2021-06-18 11:27:48', '2021-06-19 09:25:26', '2021-06-19 16:14:38', '2021-06-19 13:41:31', '2021-06-19 15:09:42', '2021-06-19 03:24:29']

index = set([x.split()[0] for x in a])
df = pd.DataFrame(index=index, columns=range(1,25), data=0)

Then set hours column 1:

for t in a:
    time = datetime.strptime(t, '%Y-%m-%d %H:%M:%S')
    idx = t.split()[0]
    df.loc[idx][time.hour] = 1

Upvotes: 0

Cimbali
Cimbali

Reputation: 11415

With l your list you can convert it do a Series of datetimes using pd.to_datetime

>>> s = pd.Series(l).transform(pd.to_datetime)
>>> s
0   2021-06-18 14:39:54
1   2021-06-18 08:30:26
2   2021-06-18 15:07:12
3   2021-06-18 13:13:29
4   2021-06-18 11:27:48
5   2021-06-19 09:25:26
6   2021-06-19 16:14:38
7   2021-06-19 13:41:31
8   2021-06-19 15:09:42
9   2021-06-19 03:24:29
dtype: datetime64[ns]

From there, there’s many ways to do this, using the .dt accessor values, here you’re interested in .dt.date and .dt.hour. The shortest is I believe using pd.crosstab:

>>> table = pd.crosstab(s.dt.date, s.dt.hour, rownames=['date'], colnames=['hour'])
>>> table
hour        3   8   9   11  13  14  15  16
date                                      
2021-06-18   0   1   0   1   1   1   1   0
2021-06-19   1   0   1   0   1   0   1   1

If you want to have all missing hours as well, you can of course reindex:

>>> table.reindex(columns=pd.RangeIndex(0, 24), fill_value=0)
            0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
date                                                                                                      
2021-06-18   0   0   0   0   0   0   0   0   1   0   0   1   0   1   1   1   0   0   0   0   0   0   0   0
2021-06-19   0   0   0   1   0   0   0   0   0   1   0   0   0   1   0   1   1   0   0   0   0   0   0   0

Upvotes: 2

Related Questions