Reputation: 477
In below method I am sorting the contents of the file according to timestamp , and it's working fine also But I don't know how to append new line when I am writing to a newly made file.It is writing in the same line I want to change the lines in my output file as input is very large I need to use chunk in this case so using readlines or storing in any data structure will won't work here
1)My Input file format is as below
TIME[04.26_12:30:30:853664] ID[ROLL:201987623] MARKS[PHY:100|MATH:200|CHEM:400]
TIME[03.27_12:29:30.553669] ID[ROLL:201987623] MARKS[PHY:100|MATH:1200|CHEM:900]
TIME[03.26_12:28:30.753664] ID[ROLL:2341987623] MARKS[PHY:100|MATH:200|CHEM:400]
TIME[03.26_12:29:30.853664] ID[ROLL:201978623] MARKS[PHY:0|MATH:0|CHEM:40]
TIME[04.27_12:29:30.553664] ID[ROLL:2034287623] MARKS[PHY:100|MATH:200|CHEM:400]
Code is as below
import re
from functools import partial
from itertools import groupby
from typing import Tuple
regex = re.compile(r"^.*TIME\[([^]]+)\]ID\[ROLL:([^]]+)\].+$")
def func1(arg) -> bool:
return regex.match(arg)
def func2(arg) -> Tuple[str, int]:
match = regex.match(arg)
if match:
return match.group(1), int(match.group(2))
return "", 0
def func3(arg) -> int:
match = regex.match(arg)
if match:
return int(match.group(2))
return 0
def read_in_chunks(file_object, chunk_size=1024*1024):
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open('b.txt') as fr:
for chunk in read_in_chunks(fr):
collection = filter(func1, chunk.splitlines())
collection = sorted(collection, key=func2)
for key, group in groupby(collection, key=func3):
with open(f"ROLL_{key}", mode="wa") as fw:
fw.writelines(group)# want suggestions to append newline character before every line
2)Actual Output what I am getting now
In file name ROLL_201987623.txt
TIME[03.27_12:29:30.553669] ID[ROLL:201987623] MARKS[PHY:100|MATH:1200|CHEM:900] TIME[04.26_12:30:30:853664] ID[ROLL:201987623] MARKS[PHY:100|MATH:200|CHEM:400]
3)Desired Output (I want to change the line as given in input format)
TIME[03.27_12:29:30.553669] ID[ROLL:201987623] MARKS[PHY:100|MATH:1200|CHEM:900]
TIME[04.26_12:30:30:853664] ID[ROLL:201987623] MARKS[PHY:100|MATH:200|CHEM:400]
Currently I am getting the output in the same line that is the main problem for me ?
Upvotes: 0
Views: 149
Reputation: 8001
The writelines()
function, despite its name, will not actually add a newline character to each line. (this is done to correspond with .readlines()
function that will not remove the \ \n
in the file.
I would suggest using fw.writelines([i+'\n' for i in group])
to manually add the necessary line breaks.
Upvotes: 1
Reputation: 4462
Maybe this will help:
# suggestions to append newline character before every line
group = map(lambda x: x + '\n', group)
fw.writelines(group)
Upvotes: 1