Reputation: 1591
I am trying to make a script which finds everything between a symbol {} in a text document. It takes the .txt documents specific part in the {} and organizes it alphabetically, then writing it inplace back to the text document. Example of text document..
bla bla bla
bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
bla bla bla
bla bla bla
Desired output(sorted alphabetically)..
bla bla bla
bla ba bl bla ba bl {airplane:hobby, apple:banana, this: something else}
bla bla bla
bla bla bla
What its still printing..
bla bla bla
bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
bla bla bla
bla bla bla
My code..
def openFind():
f = open(inFile, 'r')
lines = f.read()
match = re.findall(r'{(.*?)}', lines)
before = str(match)
n=1
for i in xrange(0, len(match), n):
mydict = match[i:i+n]
for x in sorted(mydict):
c = x.split(',')
newmatch = sorted(c)
final = str(newmatch)
print final
# NOT WORKING BELOW!!! Stuck in loop?
with open(outFile,'w') as new_file:
with open(inFile) as old_file:
for line in old_file:
new_file.write(line.replace(before, after))
It prints the sorted/alphabetical list as [airplane:hobby, apple:banana, this: something else], but how do I get it to replace the original text in the text document? Has to be inplace, but can make a new txt.
Upvotes: 1
Views: 79
Reputation: 3405
The entire program can be written succinctly as follows,
with open("file.txt") as fr:
content = fr.read()
matches = (match.group(1) for match in re.finditer(r"{(.*?)}", content))
for match in matches:
repl = ", ".join(sorted(match.split(", ")))
content = content.replace(match, repl)
with open("file.txt", "w") as f:
fw.write(content)
Upvotes: 1
Reputation: 51817
I would approach this problem in pieces. First, you want to be able to read from one file and write to a new file. You could do this a multitude of ways. If your file is small you can just use readlines()
, truncate your original file, and then write it back out.
But I'm going to assume the possibility of huge files (i.e. larger than will easily fit in RAM/swap space. Currently several GB in size).
import os
import tempfile
with tempfile.NamedTemporaryFile(delete=False) as temp:
with open(filename) as infile:
for line in infile:
temp.write(line)
os.unlink(infile)
os.rename(temp.name, infile.name)
Now we're reading each line and writing it out to the destination. Now all you need to do is intercept the line and change it up if that's necessary:
for line in infile:
match = re.search('{{.*?}}')
if match:
# Assumes you only have one "dictionary" per line
first_part, rest = line.split('{', maxsplit=1)
# allows for trailing data
data, last_part = rest.split('}', maxsplit=1)
data = [_.split(':') for _ in data.split(',')]
data.sort()
line = '{}{{{}}}{}'.format(first_part, ', '.join(':'.join(_) for _ in data))
temp.write(line)
You might have to tweak with the exact algorithm, but that's the approach that I would take when confronted with a problem like this.
Upvotes: 1
Reputation: 26900
This should work:
import re
def openFind():
with open("test.txt", "r") as in_file:
data = in_file.read()
def sub(m):
l = [s.strip() for s in m.group(1).split(",")]
l.sort()
return "{%s}" % (", ".join(l),)
replacement = re.sub(r'{(.*?)}', sub, data)
with open("out.txt", "w") as out_file:
out_file.write(replacement)
I have used re.sub()
in order to replace with the sorted match in-place.
Upvotes: 2
Reputation: 17263
Following code will sort items between {
& }
and write the result to same file:
import re
with open('test.txt', 'r+') as f:
s = f.read()
r = list(s)
for mo in re.finditer('{(.*?)}', s):
d = sorted(mo.group(1).split(', '))
r[mo.start(1):mo.end(1)] = list(', '.join(d))
f.seek(0)
f.write(''.join(r))
Upvotes: 1