Hayden Bills
Hayden Bills

Reputation: 1

Python CSV touble

I writing a code that reads a very large CSV file line by line with readlines(). I call the function with a global variable and access that variable to search for specific words and count the number of times it comes up in the file.

def init(filename):
    global lines
    with open(filename) as file:
        lines = file.readlines()


def total():
    males = 0
    females = 0
    for i in range(0, len(lines)):
        current_line = lines[i].split(",")
        if current_line[5] == 'M\n':
            males += 1
        elif current_line[5] == 'F\n':
            females += 1

    total_dict = {"Gender": {"M": males, "F": females}}
    return total_dict

for some reason this code works with smaller file, but I can't seem to get to work with a super large one.

Upvotes: 0

Views: 49

Answers (1)

user13517564
user13517564

Reputation:

If by "super large" you mean something that does not fit in RAM, then it's normal: you read the whole file in RAM, and then you deal with one row at a time: why not read the file line by line then? You could do for line in file: ...

def total(name):
    males = females = 0
    with open(name, "rt") as f:
        for line in f:
            current = line.rstrip("\r\n").split(",")
            if current[5] == "M":
                males += 1
            elif current[5] == "F":
                females += 1
    return {"Gender": {"M": males, "F": females}}

Or with a Counter (it's like a dict but you don't have to initialize zero values, entries are automatically added when you do gender[...] += 1):

from collections import Counter

def total(name):
    gender = Counter()
    with open(name, "rt") as f:
        for line in f:
            current = line.rsplit("\r\n").split(",")
            gender[current[5]] += 1
    return {"Gender": gender}

Note also that to read a CSV file, you could use the csv module.

import csv

def total(name):
    gender = Counter()
    with open(name, "rt") as f:
        for current in csv.reader(f):
            gender[current[5]] += 1
    return {"Gender": gender}

Another coding advice, not directly related to you current problem: avoid global variables unless there is a very good reason to use one: here you could simply return the list, if you insist in reading the while file in init. And when looping over a list, don't use a range as in for i in range(len(a)):, write instead for x in a:, unless you really need the index for some reason. And if you need the index, it's often better to write for i, x in enumerate(a):

Upvotes: 1

Related Questions