Data preprocessing of click stream data in real time

Question

I am working on a project to detect anomalies in web users activity in real-time. Any ill intention or malicious activity of the user has to be detected in real-time. Input data is clickstream data of users. Click data contains user-id ( Unique user ID), click URL ( URL of web page), Click text (Text/function in the website on which user has clicked) and Information (Any information typed by user). This project is similar to an Intrusion detection system (IDS). I am using python 3.6 and I have the following queries,

Which is the best approach to carry out the data preprocessing, Considering all the attributes in the dataset are categorical values.
Encoding methods like hot encoding or label encoding could be applied but data has to be processed in real-time which makes it difficult to apply
As per the requirement of the project 3 columns(click URL, Click Text and Typed information) considered as feature columns.

I am really confused about how to approach data preprocessing. Any insight or suggestions would be appreciated

Data preprocessing of click stream data in real time

Answers (1)

Related Questions