WinterZ
WinterZ

Reputation: 79

csv stream for machine learning algorithms

I have a big CSV file (around 5GB). I am trying to read line by line the whole file and try to apply the most typikal algorithms (SVM, Naive Bayes, Linear Regression, etc).

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv

i_f = open('top2Mmm.csv', 'r' )
reader = csv.reader( i_f, delimiter = ';' )
for row in reader:
print("Fila  ->", row)

I have just managed to read the CSV but I don´t know how to take each row and build a model. I am starting with a smaller file to speed up with the process but I dont know how to make this process work properly. Any clue or tip?

Upvotes: -1

Views: 557

Answers (2)

YLJ
YLJ

Reputation: 2996

Separate your data(row) into features(X) and labels(y). Then you can apply them to, for instance, SVM.

from sklearn.svm import SVC
clf = SVC()
clf.fit(X, y)

sklearn.svm reference

Upvotes: 0

Simon O'Doherty
Simon O'Doherty

Reputation: 9357

You can use the Pandas Dataframe object to load the CSV, and manipulate the data that way.

You can also iterate through the dataframe if needed.

df = pd.read_csv('top2Mmm.csv', sep=';')
for index, row in train.iterrows():
    print(row['fieldName'])

Upvotes: 1

Related Questions