user3316920
user3316920

Reputation:

Large text file to csv, can't open text file

I'm trying to convert this 3,1 GB text file from https://snap.stanford.edu/data/ into a csv file. All the data is structured like:

which makes it a pretty large text file with some million lines. I have tried to write a py script to convert it but for some reason it won't read the lines in my for each loop.

Here is the code:

import csv


def trycast(x):
    try:
        return float(x)
    except:
        try:
            return int(x)
        except:
            return x

cols = ['product_productId', 'review_userId', 'review_profileName', 'review_helpfulness', 'review_score', 'review_time', 'review_summary', 'review_text']

f = open("movies.txt", "wb")
w = csv.writer(f)
w.writerow(cols)


doc =  {}

with open('movies.txt') as infile:
    for line in infile:
        line = line.strip()
        if line=="":
            w.writerow([doc.get(col) for col in cols])
            doc = {}
        else:
            idx = line.find(":")
            key, value = tuple([line[:idx], line[idx+1:]])
            key = key.strip().replace("/", "_").lower()
            value = value.strip()
            doc[key] = trycast(value)
    f.close()

I'm not sure if it is because the document is to large, because a regulare notepad program won't be able to open it.

Thanks up front! :-)

Upvotes: 0

Views: 460

Answers (1)

Garogolun
Garogolun

Reputation: 353

In the line f = open("movies.txt", "wb") you're opening the file for writing, and thereby deleting all its content. Later on, you're trying to read from that same file. It probably works fine if you change the output filename. (I am not going to download 3.1 GB to test it. ;) )

Upvotes: 2

Related Questions