Sam
Sam

Reputation: 467

sklearn SimpelImputer SystemError: <built-in function _abc_instancecheck> returned a result with an error set

I'm imputing missing values in a pipeline.

enter image description here

In a first step I bin the variable in using:

df_listings['original.listing.rooms.bedrooms.count'] = pd.cut(df_listings['original.listing.rooms.bedrooms.count'], bins = [1,2,3,4,5,6,10,50])
df_listings = df_listings.fillna(np.nan)

enter image description here

In a second step I want to impute the column using

si = SimpleImputer(missing_values=np.nan,strategy="most_frequent")
si.fit_transform(df_listings[['original.listing.rooms.bedrooms.count']])

Even though I followed this flow with other variables, here I get the following error:

TypeError: unsupported operand type(s) for +: 'pandas._libs.interval.Interval' and 'pandas._libs.interval.Interval'

The above exception was the direct cause of the following exception:


...

SystemError: <built-in function _abc_instancecheck> returned a result with an error set

I can't understand the reason why I get this error. For other variables, I can work with pandas.Interval, just this variable causes an issue.

Upvotes: 1

Views: 621

Answers (1)

Sam
Sam

Reputation: 467

As we are working with categorical data, adding .astype(str) to your pd.cut does the trick.

df_listings['original.listing.rooms.bedrooms.count'] = pd.cut(df_listings['original.listing.rooms.bedrooms.count'], bins = [1,2,3,4,5,6,10,50]).astype(str)

Upvotes: 1

Related Questions