Reputation: 247
I'm trying to use SVM but I dont know how to fit the model when I am using pandas data frame. If my data looks like this:
df = pd.DataFrame({"x": ['011', '100', '111'] , "y": [0,1,0]})
df.x.apply(lambda x: np.array(list(map(int,x))))
>>>df
x y
0 [0, 1, 1] 0
1 [1, 0, 0] 1
2 [1, 1, 1] 0
If I try to fit the model this way:
clf = svm.SVC().fit(df.x, df.y)
I am getting this error:
ValueError: setting an array element with a sequence.
What is the correct way to fit the SVM using this data frame?
Upvotes: 5
Views: 12337
Reputation: 1
import numpy as np
from sklearn.svm import SVC
# Convert your data frame's columns into arrays
features = df['x'].to_numpy()
labels = df['y'].to_numpy()
# feed into your classifier
SVC().fit(features,labels)
Upvotes: 0
Reputation: 8813
Another solution is the code below.
import pandas as pd
import numpy as np
from sklearn.svm import SVC
df = pd.DataFrame({"x": ['011', '100', '111'] , "y": [0,1,0]})
x = df.x.apply(lambda x: pd.Series(list(x)))
x
# Out[2]:
# 0 1 2
# 0 0 1 1
# 1 1 0 0
# 2 1 1 1
SVC().fit(x, df.y)
# Out[3]:
# SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
# decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
# max_iter=-1, probability=False, random_state=None, shrinking=True,
# tol=0.001, verbose=False)
Upvotes: 2
Reputation: 402932
df = pd.DataFrame({"x": ['011', '100', '111'] , "y": [0,1,0]})
df.x = df.x.apply(lambda x: list(map(int,x)))
df
x y
0 [0, 1, 1] 0
1 [1, 0, 0] 1
2 [1, 1, 1] 0
df.x
is a column of arrays. This probably isn't the best way to store data, and it would seem sklearn
isn't very good at understanding it. It would be simpler to convert everything to a list of lists and pass that to SVC
. Try this:
x = df.x.tolist()
print(x)
[[0, 1, 1], [1, 0, 0], [1, 1, 1]]
SVC().fit(x, df.y)
Upvotes: 7