Reputation: 327
I am trying to apply isolation forest on my data which is converted from event-log but i am getting "TypeError: invalid type promotion" is it because of datetime? I don't understand what I am doing wrong!
portion of my table (after processing):
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| org:resource | lifecycle:transition | concept:name | time:timestamp | case:REG_DATE | case:concept:name | case:AMOUNT_REQ |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| 52 | 0 | 9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 | 0 | 20000 |
| 52 | 0 | 6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 | 2 | 20000 |
| 52 | 0 | 7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 | 0 | 20000 |
| 52 | 1 | 19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 | 1 | 20000 |
| 68 | 2 | 19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 | 3 | 20000 |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
when printing the
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262200 entries, 0 to 262199
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 org:resource 262200 non-null int64
1 lifecycle:transition 262200 non-null int64
2 concept:name 262200 non-null int64
3 time:timestamp 262200 non-null datetime64[ns]
4 case:REG_DATE 262200 non-null datetime64[ns]
5 case:concept:name 262200 non-null int64
6 case:AMOUNT_REQ 262200 non-null int32
dtypes: datetime64[ns](2), int32(1), int64(4)
memory usage: 13.0 MB
My codes is:
from sklearn.ensemble import IsolationForest
contamination = 0.05
model = IsolationForest(contamination=contamination, n_estimators=10000)
model.fit(df)
df["iforest"] = pd.Series(model.predict(df))
df["iforest"] = df["iforest"].map({1: 0, -1: 1})
df["score"] = model.decision_function(df)
df.sort_values("score")
However I am getting the below error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-5edb86351ac8> in <module>
4
5 model = IsolationForest(contamination=contamination, n_estimators=10000)
----> 6 model.fit(df)
7
8 df["iforest"] = pd.Series(model.predict(df))
~\.conda\envs\process_mining\lib\site-packages\sklearn\ensemble\_iforest.py in fit(self, X, y, sample_weight)
261 )
262
--> 263 X = check_array(X, accept_sparse=['csc'])
264 if issparse(X):
265 # Pre-sort indices to avoid that each individual tree of the
~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~\.conda\envs\process_mining\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
531
532 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 533 dtype_orig = np.result_type(*dtypes_orig)
534
535 if dtype_numeric:
<__array_function__ internals> in result_type(*args, **kwargs)
TypeError: invalid type promotion
Upvotes: 0
Views: 563
Reputation: 327
I found solution with help of this answer: Python - linear regression TypeError: invalid type promotion
technically u need to convert the timestamp to ordinal and it will work, i did the conversion using:
df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)
df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)
Upvotes: 1