Reputation:
I'm trying to convert this 3,1 GB text file from https://snap.stanford.edu/data/ into a csv file. All the data is structured like:
which makes it a pretty large text file with some million lines. I have tried to write a py script to convert it but for some reason it won't read the lines in my for each loop.
Here is the code:
import csv
def trycast(x):
try:
return float(x)
except:
try:
return int(x)
except:
return x
cols = ['product_productId', 'review_userId', 'review_profileName', 'review_helpfulness', 'review_score', 'review_time', 'review_summary', 'review_text']
f = open("movies.txt", "wb")
w = csv.writer(f)
w.writerow(cols)
doc = {}
with open('movies.txt') as infile:
for line in infile:
line = line.strip()
if line=="":
w.writerow([doc.get(col) for col in cols])
doc = {}
else:
idx = line.find(":")
key, value = tuple([line[:idx], line[idx+1:]])
key = key.strip().replace("/", "_").lower()
value = value.strip()
doc[key] = trycast(value)
f.close()
I'm not sure if it is because the document is to large, because a regulare notepad program won't be able to open it.
Thanks up front! :-)
Upvotes: 0
Views: 460
Reputation: 353
In the line f = open("movies.txt", "wb")
you're opening the file for writing, and thereby deleting all its content. Later on, you're trying to read from that same file. It probably works fine if you change the output filename. (I am not going to download 3.1 GB to test it. ;) )
Upvotes: 2