Reputation: 87
I have time series as shown below:
date_time system_load date month_year year month day hour load_group
0 2013-01-01 00:00:00 17.2 2013-01-01 2013-01 2013 1 1 0 (15, 20]
1 2013-01-01 01:00:00 16 2013-01-01 2013-01 2013 1 1 1 (15, 20]
2 2013-01-01 02:00:00 15 2013-01-01 2013-01 2013 1 1 2 (15, 20]
3 2013-01-01 03:00:00 15 2013-01-01 2013-01 2013 1 1 3 (15, 20]
4 2013-01-01 04:00:00 14 2013-01-01 2013-01 2013 1 1 4 (12, 15]
... ... ... ... ... ... ... ... ... ...
70123 2020-12-31 19:00:00 23 2020-12-31 2020-12 2020 12 31 19 (20, 25]
... ... ... ... ... ... ... ... ... ...
I want to get only the category (20,50] using "get_group" function as follow:
df0.groupby('load_group').get_group('(20, 25]')
I don't know how solve that problem.
I got the following : "KeyError: '(20, 25]'"
Also the same error using the following code:
df0.groupby('month_year').get_group('2016-07')
"KeyError: '2016-07'
I print df0.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date_time 70128 non-null datetime64[ns]
1 system_load 70128 non-null float64
2 date 70128 non-null object
3 month_year 70128 non-null period[M]
4 year 70128 non-null int64
5 month 70128 non-null int64
6 day 70128 non-null int64
7 hour 70128 non-null int64
8 load_group 70001 non-null category
Upvotes: 0
Views: 1039
Reputation: 4658
You must convert the group names to the exact object types which they were stored. Note the dtypes
in your df0.info()
:
month_year
is period[M]
instead of str
.load_group
is a category column containing pd.Interval
elements, not str
elements. Although this is not shown explicitly, but that should be how the data was normally produced. If they were str
, your code would have worked.The following code reproduces your df0.info()
. Here I use df
instead of df0
.
import pandas as pd
import io
df = pd.read_csv(io.StringIO("""
date_time system_load date month_year year month day hour load_group
2013-01-01 00:00:00 17.2 2013-01-01 2013-01 2013 1 1 0 (15, 20]
2013-01-01 01:00:00 16 2013-01-01 2013-01 2013 1 1 1 (15, 20]
2013-01-01 02:00:00 15 2013-01-01 2013-01 2013 1 1 2 (15, 20]
2013-01-01 03:00:00 15 2013-01-01 2013-01 2013 1 1 3 (15, 20]
2013-01-01 04:00:00 14 2013-01-01 2013-01 2013 1 1 4 (12, 15]
2020-12-31 19:00:00 23 2020-12-31 2020-12 2020 12 31 19 (20, 25]
"""), sep=r"\s{2,}", engine='python')
df["date_time"] = pd.to_datetime(df["date_time"])
# restore interval type for the elements
df["load_group"] = df["load_group"].apply(lambda s: pd.Interval(int(s[1:3]), int(s[5:7]), "right"))
# restore category type for the column
df["load_group"] = df["load_group"].astype("category")
# restore monthly period type
df["month_year"] = pd.to_datetime(df["month_year"]).dt.to_period("M")
gp = pd.Interval(20, 25, "right") # (20, 25]
print(df.groupby("load_group").get_group(gp))
Out[138]:
date_time system_load date ... day hour load_group
5 2020-12-31 19:00:00 23.0 2020-12-31 ... 31 19 (20, 25]
[1 rows x 9 columns]
gp = pd.Period(pd.to_datetime("2013-01"), freq="M")
# or equivalently, pd.Period(freq="M", year=2013, month=1)
print(df.groupby('month_year').get_group(gp))
Out[140]:
date_time system_load date ... day hour load_group
0 2013-01-01 00:00:00 17.2 2013-01-01 ... 1 0 (15, 20]
1 2013-01-01 01:00:00 16.0 2013-01-01 ... 1 1 (15, 20]
2 2013-01-01 02:00:00 15.0 2013-01-01 ... 1 2 (15, 20]
3 2013-01-01 03:00:00 15.0 2013-01-01 ... 1 3 (15, 20]
4 2013-01-01 04:00:00 14.0 2013-01-01 ... 1 4 (12, 15]
[5 rows x 9 columns]
Upvotes: 1