Reputation: 7280
I have a large XML file, ~30 MB.
Every now and then I need to update some of the values. I am using element tree
module to modify the XML. I am currently fetching the entire file, updating it and then placing it again. SO there is ~60 MB of data transfer every time. Is there a way I update the file remotely?
I am using the following code to update the file.
import xml.etree.ElementTree as ET
tree = ET.parse("feed.xml")
root = tree.getroot()
skus = ["RUSSE20924","PSJAI22443"]
qtys = [2,3]
for child in root:
sku = child.find("Product_Code").text.encode("utf-8")
if sku in skus:
print "found"
i = skus.index(sku)
child.find("Quantity").text = str(qtys[i])
child.set('updated', 'yes')
tree.write("feed.xml")
Upvotes: 0
Views: 276
Reputation: 1325
Modifying a file directly via FTP without uploading the entire thing is not possible except when appending to a file.
The reason is that there are only three commands in FTP that actually modify a file (Source):
APPE
: Appends to a fileSTOR
: Uploads a fileSTOU
: Creates a new file on the server with a unique nameCache the remote file locally and track changes to the file using the MDTM
command.
Pros:
Cons:
Split up your XML into several files. (One per product code?)
This way you only have to download the data that you actually need.
Pros:
Cons:
If the storage server supports it switching to a delta synchronization protocol like rsync
would help a lot because these only transmit the changes (with little overhead).
Pros:
Cons:
You already pointed out that you can't but it still would be the best solution.
As somebody in the comments already pointed out switching to a network file system (like NFS or CIFS/SMB) would not really help because you cannot actually change parts of the file unless the new data has the exact same length.
Unless you can do delta synchronization I'd suggest to implement some caching on the client side first and if it doesn't help enough to then split up your files.
Upvotes: 6