In python, how do I merge two text files when difference can be anywhere in the text?

Question

Now I have a list of files and I want to combine files targeting a same serial number. Each file contains thousands of lines and each line has such a format: date, count, reading.

For example the first file:

"2019-12-23 00:00:00",1123,211685,34650.75,33225.69,...(hundreds of similar numbers)
 ...(hundreds of similar lines)
"2020-02-23 06:00:00",1372,211685,34651.22,33224.6,...
"2020-02-23 12:00:00",1373,211685,34650.34,33224.74,...

The 2nd file:

"2019-12-17 12:00:00",1101,211685,34649.3,33225.8...
 ...
"2020-02-15 00:00:00",1339,211685,34651.66,33225.32,...
"2020-02-15 06:00:00",1340,211685,34651.63,33225.19...

The problem is, the missing lines can be either in the beginning or at the end of the file. Initial 100 readings might be missing in one file while the other file may miss the latest 50 readings. I wonder what could be the best way to merge them? I can think of using "set", but I'm not sure if it is capable of adding missing lines in the middle of a file.

An example of completed lines:

"2019-12-17 12:00:00",1101,211685,34649.3,33225.8...
 ...
"2019-12-23 00:00:00",1123,211685,34650.75,33225.69,...
 ...
"2020-02-15 00:00:00",1339,211685,34651.66,33225.32,...
"2020-02-15 06:00:00",1340,211685,34651.63,33225.19...
 ...
"2020-02-23 06:00:00",1372,211685,34651.22,33224.6,...
"2020-02-23 12:00:00",1373,211685,34650.34,33224.74,...

tdelaney · Accepted Answer

set doesn't maintain order but you can sort it later to get the output file you want. When a date string is written as year-month-day-hour-minute-second in UTC, then it can be sorted either highest to lowest or lowest to highest without any date conversion. Write it in American "June 31 2019 12:30 PM MST" and you'd need to convert to something sortable.

def merge_files(filenames, outfilename):
    rows = set()
    for filename in filenames:
        rows.update(open(filename))
    with open(outfilename, 'w') as fp:
        fp.writelines(sorted(rows))

In python, how do I merge two text files when difference can be anywhere in the text?

Answers (2)

Related Questions