Input formatting for models such as logistic regression and KNN for Python

Question

In my training set I have 24 Feature Vectors(FV). Each FV contains 2 lists. When I try to fit this on model = LogisticRegression() or model = KNeighborsClassifier(n_neighbors=k) I get this error ValueError: setting an array element with a sequence.

In my dataframe, each row represents each FV. There are 3 columns. The first column contains a list of an individual's heart rate, second a list of the corresponding activity data and third the target. Visually, it looks like something like this:

HR                         ACT                      Target
[0.5018, 0.5106, 0.4872]  [0.1390, 0.1709, 0.0886]  1
[0.4931, 0.5171, 0.5514]  [0.2423, 0.2795, 0.2232]  0

Should I:

Join both lists to form on long FV
Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.

How does Logistic Regression and KNNs handle input data? I understand that logistic regression combines the input linearly using weights or coefficient values. But I am not sure what that means when it comes to lists VS dataframe columns. Does it mean it automatically converts corresponding values of dataframe columns to a list before transforming? Is there a difference between method 1 and 2?

Additionally, if a long list is required, should I have the long list as [HR,HR,HR,ACT,ACT,ACT] or [HR,ACT,HR,ACT,HR,ACT].

Ahmed Ragab · Accepted Answer

You should go with 2

Expand both lists such that each column represents one value. In other words, if there are 5 items in HR and ACT data for a FV, the new dataframe would have 10 columns for features and 1 for Target.

You should then select the feature columns from the dataframe and pass it as X, and the target column as Y to the model's fit function.

Sklearn's models accepts inputs with the following shape [n_samples, n_features], and since after following the 2nd solution you proposed, your training dataframe will have 2D of the shape [n_samples, 10].

Input formatting for models such as logistic regression and KNN for Python

Answers (1)

Related Questions