Reputation: 724
I am very sorry if this question violates SO's question guidelines but I am stuck and I cannot find anywhere else to ask this type of questions. Suppose I have a dataset containing three experimental data that were obtained in three different conditions (hot, cold, comfortable). The data is arranged in three columns in a pandas dataframe
consisting of 4 columns (time, cold, comfortable and hot
).
When I plot the data, I can visually see the separation of the three experiments, but I would like to do it automatically with machine learning.
The x-axis represents the time
and the y-axis represents the magnitude
of the data. I have read about different machine learning classification techniquesbut I do not understand how to set up my data so that I can 'feed' it into the classification
algorithm. Namely, my questions are:
hot, comfortable or cold
. The time series is not much of relevance in my caseUpvotes: 1
Views: 452
Reputation: 12515
Of course this is feasible.
It's not entirely clear from the original post exactly what variables/features you have available for your model, but here is a bit of general guidance. All of these machine learning problems, from classification to regression, rely on the same core assumption that you are trying to predict some outcome based on a bunch of inputs. Usually this relationship is modeled like this: y ~ X1 + X2 + X3 ...
, where y
is your outcome ("dependent") variable, and X1
, X2
, etc. are features ("explanatory" variables). More simply, we can say that using our entire feature-set matrix X
(i.e. the matrix containing all of our x-variables), we can predict some outcome variable y
using a variety of ML techniques.
So in your case, you'd try to predict whether it's Cold
, Comfortable
, or Hot
based on time
. This is really more of a forecasting problem than it is a ML problem, since you have a time component that looks to be one of the most important (if not the only) features in your dataset. You may want to look at some simpler time-series forecasting methods (e.g. ARIMA) instead of ML algorithms, as some of the time-series ML approaches may not be well-suited for a beginner.
In any case, this should get you started, I think.
Upvotes: 1