Sam.H
Sam.H

Reputation: 327

Isolation Forest - TypeError: invalid type promotion

I am trying to apply isolation forest on my data which is converted from event-log but i am getting "TypeError: invalid type promotion" is it because of datetime? I don't understand what I am doing wrong!

portion of my table (after processing):

 +--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| org:resource | lifecycle:transition | concept:name |   time:timestamp   |   case:REG_DATE    | case:concept:name | case:AMOUNT_REQ |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
|           52 |                    0 |            9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    0 |            6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 |                 2 |           20000 |
|           52 |                    0 |            7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    1 |           19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 |                 1 |           20000 |
|           68 |                    2 |           19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 |                 3 |           20000 |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+

when printing the

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262200 entries, 0 to 262199
Data columns (total 7 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   org:resource          262200 non-null  int64         
 1   lifecycle:transition  262200 non-null  int64         
 2   concept:name          262200 non-null  int64         
 3   time:timestamp        262200 non-null  datetime64[ns]
 4   case:REG_DATE         262200 non-null  datetime64[ns]
 5   case:concept:name     262200 non-null  int64         
 6   case:AMOUNT_REQ       262200 non-null  int32         
dtypes: datetime64[ns](2), int32(1), int64(4)
memory usage: 13.0 MB

My codes is:

from sklearn.ensemble import IsolationForest

contamination = 0.05

model = IsolationForest(contamination=contamination, n_estimators=10000)
model.fit(df)

df["iforest"] = pd.Series(model.predict(df))
df["iforest"] = df["iforest"].map({1: 0, -1: 1})
df["score"] = model.decision_function(df)
df.sort_values("score")

However I am getting the below error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-5edb86351ac8> in <module>
      4 
      5 model = IsolationForest(contamination=contamination, n_estimators=10000)
----> 6 model.fit(df)
      7 
      8 df["iforest"] = pd.Series(model.predict(df))

~\.conda\envs\process_mining\lib\site-packages\sklearn\ensemble\_iforest.py in fit(self, X, y, sample_weight)
    261                 )
    262 
--> 263         X = check_array(X, accept_sparse=['csc'])
    264         if issparse(X):
    265             # Pre-sort indices to avoid that each individual tree of the

~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    531 
    532         if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 533             dtype_orig = np.result_type(*dtypes_orig)
    534 
    535     if dtype_numeric:

<__array_function__ internals> in result_type(*args, **kwargs)

TypeError: invalid type promotion

Upvotes: 0

Views: 563

Answers (1)

Sam.H
Sam.H

Reputation: 327

I found solution with help of this answer: Python - linear regression TypeError: invalid type promotion

technically u need to convert the timestamp to ordinal and it will work, i did the conversion using:

df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)
df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)

Upvotes: 1

Related Questions