Reputation: 5
I am trying to implement my first anomaly detection with IsolationForest, but unfortunately it does not succeed.
I have a .csv
file with different network parameters like ip.ttl, frame.len, etc.
#Einlesen
quelle = pd.read_csv('./x.csv')
pdf=quelle.to_numpy()
print(quelle.columns)
Index([';ip.proto;ttl;frame.len;ip.src;ip.dst;ip.len;ip.flags;eth.src;eth.dst;eth.type;vlan.id;udp.port'], dtype='object')
print(quelle.shape)
(1658, 1)
But when I try to create the IsolationForest model with a column like ip.ttl or frame.len (one of the columns), I get an error
model=IsolationForest(n_estimators=50, max_samples='auto',contamination=float(0.1),max_features=1.0)
model.fit(quelle[['frame.len']])
KeyError: "None of [Index(['frame.len'], dtype='object')] are in the [columns]"
Where is my mistake?
Thanks in advance
Upvotes: 0
Views: 771
Reputation: 444
The dataframe has many datapoints but only a single column.
print(quelle.shape)
(1658, 1)
When you loaded the file into the dataframe it failed to auto detect what is the proper delimiter of the file and instead of reading each column, it packed all columns into a single column.
To solve this issue, you should specify delimiter when reading the file.
pd.read_csv('./x.csv', sep=';')
Upvotes: 1