Dave
Dave

Reputation: 19250

How do I replace new lines characters in CSV files downloaded using the Google sheets API?

I'm using Python 3.9 and the Google Sheets SDK for Python v 3.6.0. I want to download a Google sheet as a CSV and I would like to replace any new line characters that occur in the cell data with nothing. I tried the below

    client = gspread.authorize(creds)
    sheet = client.open('ChiCommons_Directory')

    # get the third sheet of the Spreadsheet.  This
    # contains the data we want
    sheet_instance = sheet.get_worksheet(3)

    url = 'https://docs.google.com/spreadsheets/d/' + sheet.id + '/gviz/tq?tqx=out:csv&gid=' + str(sheet_instance.id)
    headers = {'Authorization': 'Bearer ' + client.auth.token}
    res = requests.get(url, headers=headers)
    output = re.sub(r'[\n\r]', '', res.text)
    print(output)

This doesn't seem to be replacing anything. Specifically the line output = re.sub(r'[\n\r]', '', res.text) does not seem to be doing its job. However, I only want to replace new lines in teh cell data, not the new lines that divide each row of data. Not sure the best way to do that.

Upvotes: 0

Views: 612

Answers (1)

Tanaike
Tanaike

Reputation: 201553

I believe your goal as follows.

  • You want to replace the line breaks in each cell.
  • You don't want to replace the line breaks for each row in the CSV data.

Modification points:

  • When I tested your script, the line breaks for each row are also removed. So, in this case, I would like to propose the following flow.
    1. Parse the CSV data and convert it to a list.
    2. Remove the line breaks in each cell.
    3. Convert the list to the CSV data.

When above points are reflected to your script, it becomes as follows.

Modified script:

Please modify your script as follows.

From:
output = re.sub(r'[\n\r]', '', res.text)
print(output)
To:
ar = csv.reader(io.StringIO(res.text, newline=""))
output = "\n".join([",".join(map(str, ['"' + c.replace('\n', '') + '"' for c in r])) for r in ar])
# or, output = "\n".join([",".join(map(str, ['"' + re.sub(r'[\n\r]', '', c) + '"' for c in r])) for r in ar])
print(output)
  • In this case, please use import csv and import io.

  • If you don't want to add " for each cell, please modify as follows.

      output = "\n".join([",".join(map(str, [c.replace('\n', '') for c in r])) for r in ar])
    

Upvotes: 1

Related Questions