Reputation: 467
I'm imputing missing values in a pipeline.
In a first step I bin the variable in using:
df_listings['original.listing.rooms.bedrooms.count'] = pd.cut(df_listings['original.listing.rooms.bedrooms.count'], bins = [1,2,3,4,5,6,10,50])
df_listings = df_listings.fillna(np.nan)
In a second step I want to impute the column using
si = SimpleImputer(missing_values=np.nan,strategy="most_frequent")
si.fit_transform(df_listings[['original.listing.rooms.bedrooms.count']])
Even though I followed this flow with other variables, here I get the following error:
TypeError: unsupported operand type(s) for +: 'pandas._libs.interval.Interval' and 'pandas._libs.interval.Interval'
The above exception was the direct cause of the following exception:
...
SystemError: <built-in function _abc_instancecheck> returned a result with an error set
I can't understand the reason why I get this error. For other variables, I can work with pandas.Interval, just this variable causes an issue.
Upvotes: 1
Views: 621
Reputation: 467
As we are working with categorical data, adding .astype(str) to your pd.cut does the trick.
df_listings['original.listing.rooms.bedrooms.count'] = pd.cut(df_listings['original.listing.rooms.bedrooms.count'], bins = [1,2,3,4,5,6,10,50]).astype(str)
Upvotes: 1