Reputation: 43
I have a requirement to add carriage return at the end of each line in a CSV file that is available in a GCS bucket. I want to store the modified data in a new file in same GCS bucket. What we observed is though we are writing the data to new CSV file, it is overriding the original file.
Below is the code snippet that i am using for the task.
Can someone help me on how this can be achieved without changing the data in original file.
Thanks, Hemaraju
from google.cloud import storage
storage_client = storage.Client.from_service_account_json(r"C:\Users\XXXXX\Downloads\GCP-Key\XXXX.json")
bucket = storage_client.get_bucket('test-export-bucket-XXXX')
blob = bucket.blob('new/test_file.csv')
destination_blob = bucket.blob('test/modified_test_file.csv')
data = blob.download_as_string()
count = 0
for line in data.splitlines():
count +=1
print(line)
newline = line.decode('utf8') + '\r\n'
print(newline)
destination_blob.upload_from_string(newline)
Upvotes: 1
Views: 1092
Reputation: 331
Just here to illustrate what @Guillaume explained in case it wasn't clear. Please select his answer as the correct one.
text_buffer = ""
for line in data.splitlines():
text_buffer += line.decode('utf8') + '\r\n'
destination_blob.upload_from_string(text_buffer)
Upvotes: 1
Reputation: 75920
Firstly, your code is wrong. You can't write line by line. If you do that, the new write override the previous one and thus, at the end, you will store only the lastest line.
Instead, create a buffer of your transformed file and write it after your FOR loop (not inside).
Then, It's not normal that you override the original file. Are you sure? The destination blob is clearly different, and you can't override existing data like that.
Upvotes: 1