python large file parsing

Question

I have a 1GB log file(.txt) in the following format,

[ABC] [12.45] [bla bla bla] [12345]
[DEF] [12.45] [bla bla bla] [12345]

I am trying to parse it into arrays for each of the []. So far, I have tried numpy.genfromtxt and also reading line by line by opening the file. numpy gives some MemoryError with 1GB file. the lin-by-line method takes about 35 secs.

Is there any other library or way to fasten the parsing?

Reading line by line:

with open(filePath) as f:
    for line in f:
        splits = findall('$$(.*?)$$', line)
        A.append(splits[0].strip())
        B.append(datetime.datetime.strptime(splits[2], '%H:%M:%S.%f'))
        C.append(splits[4])

Aaron A · Accepted Answer

You can speed up the parsing significantly by using str.split instead of re.findall.

with open('input.txt') as f:
    for line in f:
        splits = line.split('] [')
        A.append(splits[0][1:])
        B.append(splits[1])
        C.append(splits[2])
        D.append(splits[3][:-1])

python large file parsing

Answers (1)

Related Questions