Reputation: 1225
I have a following problem. I have these data:
data = pd.DataFrame(
{
"time": [
"2021-11-30 00:01:08",
"2021-11-30 00:01:10",
"2021-11-30 00:03:08",
"2021-11-30 00:10:08",
],
"country": ["Czechia","Czechia","USA","Czechia"],
}
)
I need to count how many times each country occurs in the dataset in each five minute. That is from 2021-11-30 00:00:00 to 2021-11-30 00:04:59
; 2021-11-30 00:05:00 to 2021-11-30 00:09:59
etc. Countries should be transformed into new columns. Desired output is:
output = pd.DataFrame(
{
"Czechia": [2,0,1],
"USA": [1,0,0],
}
)
How can I do this, please?
Upvotes: 0
Views: 23
Reputation: 862481
Use Grouper
with GroupBy.size
and reshape by Series.unstack
and DataFrame.asfreq
:
data["time"] = pd.to_datetime(data["time"])
df = (data.groupby([pd.Grouper(freq='5Min', key='time'), 'country'])
.size()
.unstack(fill_value=0)
.asfreq('5Min', fill_value=0))
print (df)
country Czechia USA
time
2021-11-30 00:00:00 2 1
2021-11-30 00:05:00 0 0
2021-11-30 00:10:00 1 0
Upvotes: 1
Reputation: 29742
One way using pandas.to_datetime
with floor
then groupby
:
data["time"] = pd.to_datetime(data["time"]).dt.floor("5min")
new_df = data.groupby("time")["country"].value_counts().unstack().asfreq("5min").fillna(0)
print(new_df)
Output:
country Czechia USA
time
2021-11-30 00:00:00 2.0 1.0
2021-11-30 00:05:00 0.0 0.0
2021-11-30 00:10:00 1.0 0.0
Upvotes: 1