swhh
swhh

Reputation: 87

KeyError with using get_group in python pandas

I have time series as shown below:

                   date_time  system_load    date    month_year  year  month  day  hour   load_group
0     2013-01-01 00:00:00     17.2      2013-01-01    2013-01  2013      1    1     0  (15, 20]
1     2013-01-01 01:00:00     16        2013-01-01    2013-01  2013      1    1     1  (15, 20]
2     2013-01-01 02:00:00     15        2013-01-01    2013-01  2013      1    1     2  (15, 20]
3     2013-01-01 03:00:00     15        2013-01-01    2013-01  2013      1    1     3  (15, 20]
4     2013-01-01 04:00:00     14        2013-01-01    2013-01  2013      1    1     4  (12, 15]
...                   ...          ...         ...        ...   ...    ...  ...   ...             ...
70123 2020-12-31 19:00:00     23        2020-12-31    2020-12  2020     12   31    19  (20, 25]
...                   ...          ...         ...        ...   ...    ...  ...   ...             ...

I want to get only the category (20,50] using "get_group" function as follow:

df0.groupby('load_group').get_group('(20, 25]')

I don't know how solve that problem.

I got the following : "KeyError: '(20, 25]'"

Also the same error using the following code:

df0.groupby('month_year').get_group('2016-07')

"KeyError: '2016-07'

I print df0.info()

 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   date_time    70128 non-null  datetime64[ns]
 1   system_load  70128 non-null  float64       
 2   date         70128 non-null  object        
 3   month_year   70128 non-null  period[M]     
 4   year         70128 non-null  int64         
 5   month        70128 non-null  int64         
 6   day          70128 non-null  int64         
 7   hour         70128 non-null  int64         
 8   load_group   70001 non-null  category      

Upvotes: 0

Views: 1039

Answers (1)

Bill Huang
Bill Huang

Reputation: 4658

You must convert the group names to the exact object types which they were stored. Note the dtypes in your df0.info():

  • month_year is period[M] instead of str.
  • load_group is a category column containing pd.Interval elements, not str elements. Although this is not shown explicitly, but that should be how the data was normally produced. If they were str, your code would have worked.

Restore Data and Dtypes

The following code reproduces your df0.info(). Here I use df instead of df0.

import pandas as pd
import io

df = pd.read_csv(io.StringIO("""
date_time            system_load    date      month_year  year  month  day  hour   load_group
2013-01-01 00:00:00     17.2      2013-01-01     2013-01  2013      1    1     0   (15, 20]
2013-01-01 01:00:00     16        2013-01-01     2013-01  2013      1    1     1   (15, 20]
2013-01-01 02:00:00     15        2013-01-01     2013-01  2013      1    1     2   (15, 20]
2013-01-01 03:00:00     15        2013-01-01     2013-01  2013      1    1     3   (15, 20]
2013-01-01 04:00:00     14        2013-01-01     2013-01  2013      1    1     4   (12, 15]
2020-12-31 19:00:00     23        2020-12-31     2020-12  2020     12   31    19   (20, 25]
"""), sep=r"\s{2,}", engine='python')

df["date_time"] = pd.to_datetime(df["date_time"])

# restore interval type for the elements
df["load_group"] = df["load_group"].apply(lambda s: pd.Interval(int(s[1:3]), int(s[5:7]), "right"))
# restore category type for the column
df["load_group"] = df["load_group"].astype("category")

# restore monthly period type    
df["month_year"] = pd.to_datetime(df["month_year"]).dt.to_period("M")

Code

Interval Column

gp = pd.Interval(20, 25, "right")   # (20, 25]
print(df.groupby("load_group").get_group(gp))

Out[138]: 
            date_time  system_load        date  ... day  hour  load_group
5 2020-12-31 19:00:00         23.0  2020-12-31  ...  31    19    (20, 25]

[1 rows x 9 columns]

Period Column

gp = pd.Period(pd.to_datetime("2013-01"), freq="M")
# or equivalently, pd.Period(freq="M", year=2013, month=1)

print(df.groupby('month_year').get_group(gp))

Out[140]: 
            date_time  system_load        date  ... day  hour  load_group
0 2013-01-01 00:00:00         17.2  2013-01-01  ...   1     0    (15, 20]
1 2013-01-01 01:00:00         16.0  2013-01-01  ...   1     1    (15, 20]
2 2013-01-01 02:00:00         15.0  2013-01-01  ...   1     2    (15, 20]
3 2013-01-01 03:00:00         15.0  2013-01-01  ...   1     3    (15, 20]
4 2013-01-01 04:00:00         14.0  2013-01-01  ...   1     4    (12, 15]
[5 rows x 9 columns]

Upvotes: 1

Related Questions