Reputation: 151
I have a csv file with rows of classifications/labels followed by the data associated with them:
cat, 0, 1, 45, 23, ...
dog, 1, 5, 75, 23, ...
cat, 3, 4, 63, 24, ...
cat, 0, 1, 44, 23, ...
dog, 7, 3, 25, 4, ...
How can I load the csv file into sklearn?
Edit: or do I need to replace the labels with number equivalents? I.e. dog = 1, cat = 2, etc.
Upvotes: 0
Views: 547
Reputation: 567
* Edited based in Vivek's comment
You could use pandas. Here is an example of feeding the data into a simple random forest classifier:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
data = pd.read_csv('/path/to/data')
Y = data[[0]] # labels
X = data.drop([0], axis = 1) # features
clf = RandomForestClassifier()
clf.fit(X, Y)
Upvotes: 2