Gronk
Gronk

Reputation: 391

Adding non-duplicate strings from one txt to another in Python3.3

I have 2 text files (new.txt and master.txt). Each has different data stored as such:

Cory 12 12:40:12.016221
Suzy 64 12:40:33.404614
Trent 145 12:40:56.640052

(catagorised by the first set of numbers appearing on each line)

I have to scan each line of new.txt for the name (e.g. Suzy), check if there is a duplicate in master.txt and if there isn't, then I add that line to master.txt catagorized by that line's number (e.g. 64 in Suzy 64 12:40:33.404614).

I have written the following script, but it falls into a loop of checking the 1st line of new.txt (I know why, I just don't know how to work around not closing fileinput.input(new.txt) so that I can then open fileinput.input(master.txt) further down the loop). I feel like I've highly over complicated things for myself and any help is appreciated.

import fileinput
import re

end_of_file = False

while end_of_file == False:
    for line in fileinput.input('new.txt', inplace=1):
        end_of_file = fileinput.isstdin() #ends while loop if on last line of new.txt
        user_f_line_list = line.split()
        master_f = open('master.txt', 'r')
        master_f_read = master_f.read()
        master_f.close()
        fileinput.close()
        if not re.findall(user_f_line_list[0], master_f_read):
            for line in fileinput.input('master.txt', inplace=1):
                master_line_list = line.split()
                if int(user_f_line_list[1]) <= int(master_line_list[1]):
                    written = False
                    while written == False:
                        written = True
                        print(' '.join(user_f_line_list))
                print(line, end='')
            fileinput.close()

And for reference, master.txt starts with startline 0 and ends with endline 1000000000000000 so that it is impossible for the categorizing to be out of range.

Upvotes: 0

Views: 44

Answers (1)

user1462309
user1462309

Reputation: 489

Some suggestions:

  1. Open master.txt into a list with readlines().
  2. Use an OrderedDict from the collections module - it is the same as a regular dict but preserves the order. Make each key the unique element - a tuple in this case (e.g. ("Cory", 12)). Make the value whatever comes after.
  3. Now you can very rapidly check to see if the entry is present by if key in my_dict:.
  4. If it isn't, you can insert it. If you need to insert in order, it'll take a bit more work, but not too much. I would insert in the end, convert to a list when all is done, and apply a sort function to the list with a custom function to specify how to sort.
  5. Output it back to the file.

I won't say it's necessarily shorter than your solution, but it is a lot cleaner.

Upvotes: 1

Related Questions