Reputation: 457
I have a 1GB log file(.txt) in the following format,
[ABC] [12.45] [bla bla bla] [12345]
[DEF] [12.45] [bla bla bla] [12345]
I am trying to parse it into arrays for each of the []. So far, I have tried numpy.genfromtxt and also reading line by line by opening the file. numpy gives some MemoryError with 1GB file. the lin-by-line method takes about 35 secs.
Is there any other library or way to fasten the parsing?
Reading line by line:
with open(filePath) as f:
for line in f:
splits = findall('\[(.*?)\]', line)
A.append(splits[0].strip())
B.append(datetime.datetime.strptime(splits[2], '%H:%M:%S.%f'))
C.append(splits[4])
Upvotes: 1
Views: 126
Reputation: 495
You can speed up the parsing significantly by using str.split
instead of re.findall
.
with open('input.txt') as f:
for line in f:
splits = line.split('] [')
A.append(splits[0][1:])
B.append(splits[1])
C.append(splits[2])
D.append(splits[3][:-1])
Upvotes: 1