Reputation: 67
I have a .csv
file that contains my data. I would like to do Logistic Regression
, Naive Bayes
and Decision Trees
. I already know how to implement these.
However, my teacher wants me to split the data in my .csv
file into 80%
and let my algorithms predict the other 20%
. I would like to know how to actually split the data in that way.
diabetes_df = pd.read_csv("diabetes.csv")
diabetes_df.head()
with open("diabetes.csv", "rb") as f:
data = f.read().split()
train_data = data[:80]
test_data = data[20:]
I tried to split it like this (sure it isn't working).
Upvotes: 0
Views: 6764
Reputation: 136675
Sklearns sklearn.model_selection.train_test_split
is what you are looking for:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=0)
Upvotes: 4
Reputation: 77
splitted_csv = "value1,value2,value3".split(',')
print(str(splitted_csv)) #["value1", "value2", "value3"]
print(splitted_csv[0]) #value1
print(splitted_csv[1]) #value2
print(splitted_csv[2]) #value3
There are also libraries that parse csv and allow you to access at value by column name, but from your example i thought that you need some "low level" way to do it
Upvotes: -1