Reputation: 79
I have a big CSV file (around 5GB). I am trying to read line by line the whole file and try to apply the most typikal algorithms (SVM, Naive Bayes, Linear Regression, etc).
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv
i_f = open('top2Mmm.csv', 'r' )
reader = csv.reader( i_f, delimiter = ';' )
for row in reader:
print("Fila ->", row)
I have just managed to read the CSV but I don´t know how to take each row and build a model. I am starting with a smaller file to speed up with the process but I dont know how to make this process work properly. Any clue or tip?
Upvotes: -1
Views: 557
Reputation: 2996
Separate your data(row) into features(X) and labels(y). Then you can apply them to, for instance, SVM.
from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y)
Upvotes: 0
Reputation: 9357
You can use the Pandas Dataframe object to load the CSV, and manipulate the data that way.
You can also iterate through the dataframe if needed.
df = pd.read_csv('top2Mmm.csv', sep=';')
for index, row in train.iterrows():
print(row['fieldName'])
Upvotes: 1