Reputation: 67
I am new to python. I am trying to use a file with new data (newprops) to replace the old data on a second file. Both files are over 3MB.
File with new data looks like this:
PROD 850 30003 0.096043
PROD 851 30003 0.096043
PROD 853 30003 0.096043
PROD 852 30003 0.096043
....
Original file with old data is something like:
CROD 850 123456 123457 123458 123459
PROD 850 30003 0.08
CROD 851 123456 123457 123458 123459
PROD 851 30003 0.07
CROD 852 123456 123457 123458 123459
PROD 852 30003 0.095
CROD 853 123456 123457 123458 123459
PROD 853 30003 0.095
....
Output should be:
CROD 850 123456 123457 123458 123459
PROD 850 30003 0.096043
CROD 851 123456 123457 123458 123459
PROD 851 30003 0.096043
CROD 852 123456 123457 123458 123459
PROD 852 30003 0.096043
CROD 853 123456 123457 123458 123459
PROD 853 30003 0.096043
Here's what I have so far:
import fileinput
def prop_update(newprops,bdffile):
fnewprops=open(newprops,'r')
fbdf=open(bdffile,'r+')
newpropsline=fnewprops.readline()
fbdfline=fbdf.readline()
while len(newpropsline)>0:
fbdf.seek(0)
propname=newpropsline.split()[1]
propID=newpropsline.split()[2]
while len(fbdfline)>0:
if propID and propname in fbdfline:
bdffile.write(newpropsline) #i'm stuck here... I want to delete the old line and use updated value
else:
fbdfline=fbdfline.readline()
newpropsline=fnewprops.readline()
fnewprops.close()
Please help!
Upvotes: 0
Views: 118
Reputation: 180401
You can take every second line from the original and zip them with the new lines then reopen the original and write the updated lines, presuming the new lines are equal to half the length or the original:
from itertools import izip
with open("new.txt") as f,open("orig.txt") as f2:
lines = f2.readlines()
zipped = izip(lines[::2],f) # just use zip for python3
with open("orig.txt","w") as out:
for pair in zipped:
out.writelines(pair)
If you want the lines sorted based on the second column, you also need to strip and insert newlines manually so the final lines get separated:
from itertools import izip,islice
with open("new.txt") as f, open("orig.txt") as f2:
orig = sorted((x.strip() for x in islice(f2, 0, None, 2)), key=lambda x: int(x.split(None, 2)[1]))
new = sorted((x.strip() for x in f), key=lambda x:int(x.split(None,2)[1]))
zipped = izip(orig, new)
with open("orig.txt","w") as out:
for pair in zipped:
out.write("{}\n{}\n".format(*pair))
Output:
CROD 850 123456 123457 123458 123459
PROD 850 30003 0.096043
CROD 851 123456 123457 123458 123459
PROD 851 30003 0.096043
CROD 852 123456 123457 123458 123459
PROD 852 30003 0.096043
CROD 853 123456 123457 123458 123459
PROD 853 30003 0.096043
if the length is not the same you can use itertools.izip_longest with a fillvalue of ""
so you don't lose any data:
If the old file is already in order just forget the sorted call on f2 and use f2.readlines()[::2]
but if it is not in order then this will make sure all lines are sorted based on the second column regardless of the original order.
Upvotes: 1
Reputation: 77337
You can use a dict to index the new data. Then write the original file to a new file, line by line, updating data from the index as you go. It looks like the first three items should be the key ("PROD 850 30003") and they can be pulled out with a regex such as (PROD\s+\d+\s+\d+)
.
import re
_split_new = re.compile(r"(PROD\s+\d+\s+\d+)(.*)")
# create an index for the PROD items to be updated
# this might be a bit more understandable...
#with open('updates.txt') as updates:
# new_data = {}
# for line in updates:
# match = _split_new.match(line)
# if match:
# key, value = match.groups()
# new_data[key] = value
# ... but this is fancier (and likely faster)
with open('updates.txt') as updates:
new_data = dict(match.groups()
for match in (_split_new.search(line) for line in updates)
if match)
# then process the updates
with open('origstuff.txt') as orig, open('newstuff.txt', 'w') as newstuff:
# for each line in the original...
for line in orig:
match = _split_new.match(line)
# ... see if its a PROD line
if match:
key, value = match.groups()
# ... and rewrite with value from indexing dict (defaulting to current value)
newstuff.write("%s%s\n" % (key, new_data.get(key, value)))
else:
# ... or just the original line
newstuff.write(line)
Upvotes: 1