Reputation: 59
I have a list of datetime string values. I wanted to have them in hourly distribution from Hours 00 to 24.
For example, a sample list:
['2021-06-18 14:39:54', '2021-06-18 08:30:26', '2021-06-18 15:07:12', '2021-06-18 13:13:29', '2021-06-18 11:27:48', '2021-06-19 09:25:26', '2021-06-19 16:14:38', '2021-06-19 13:41:31', '2021-06-19 15:09:42', '2021-06-19 03:24:29']
and expected dataframe view:
date 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0 2021-06-18 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0
1 2021-06-19 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0
I converted the values to datetime by :
time = datetime.strptime(item, '%Y-%m-%d %H:%M:%S')
and got the hours
hr = time.hour
In can get the result using if else , but its too naive. Is their a way to do this elegantly.
TIA.
Upvotes: 0
Views: 680
Reputation: 607
You can do this with dataframe.loc. First get indices of dataframe and create it:
a = ['2021-06-18 14:39:54', '2021-06-18 08:30:26', '2021-06-18 15:07:12', '2021-06-18 13:13:29', '2021-06-18 11:27:48', '2021-06-19 09:25:26', '2021-06-19 16:14:38', '2021-06-19 13:41:31', '2021-06-19 15:09:42', '2021-06-19 03:24:29']
index = set([x.split()[0] for x in a])
df = pd.DataFrame(index=index, columns=range(1,25), data=0)
Then set hours column 1:
for t in a:
time = datetime.strptime(t, '%Y-%m-%d %H:%M:%S')
idx = t.split()[0]
df.loc[idx][time.hour] = 1
Upvotes: 0
Reputation: 11415
With l
your list you can convert it do a Series of datetimes using pd.to_datetime
>>> s = pd.Series(l).transform(pd.to_datetime)
>>> s
0 2021-06-18 14:39:54
1 2021-06-18 08:30:26
2 2021-06-18 15:07:12
3 2021-06-18 13:13:29
4 2021-06-18 11:27:48
5 2021-06-19 09:25:26
6 2021-06-19 16:14:38
7 2021-06-19 13:41:31
8 2021-06-19 15:09:42
9 2021-06-19 03:24:29
dtype: datetime64[ns]
From there, there’s many ways to do this, using the .dt
accessor values, here you’re interested in .dt.date
and .dt.hour
.
The shortest is I believe using pd.crosstab
:
>>> table = pd.crosstab(s.dt.date, s.dt.hour, rownames=['date'], colnames=['hour'])
>>> table
hour 3 8 9 11 13 14 15 16
date
2021-06-18 0 1 0 1 1 1 1 0
2021-06-19 1 0 1 0 1 0 1 1
If you want to have all missing hours as well, you can of course reindex
:
>>> table.reindex(columns=pd.RangeIndex(0, 24), fill_value=0)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
date
2021-06-18 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0
2021-06-19 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0
Upvotes: 2