Isaac Rivera
Isaac Rivera

Reputation: 103

Remove xml character in python

I have the following code, which takes information from an XML file and saves some data in a csv file.

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('file.xml')
root = tree.getroot()

title = []
category = []
url = []
prod = []

def find_title():
    for t in root.findall('solution/head'):
        title.append(t.find('title').text)

    for c in root.findall('solution/body'):
        category.append(c.find('category').text)

    for u in root.findall('solution/body'):
        url.append(u.find('video').text)

    for p in root.findall('solution/body'):
        prod.append(p.find('product').text)

find_title()

headers = ['Title', 'Category', 'Video URL','Product']

def save_csv():
    with open('titles.csv', 'w') as f:
        f_csv = csv.writer(f, lineterminator='\r')
        f_csv.writerow(headers)
        f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

save_csv()

I have found an issue with the text that contains ',' because it separates the output save in the list e.g:

<title>Add, Change, or Remove Transitions between Slides</title>

is getting save in the list as [Add, Change, or Remove Transitions between Slides] which make sense since this is a csv file, however, I would like to keep the whole output together.

So I there any way to remove the ',' from the title tag or can I add more code to override the ','

Thanks in advance

Upvotes: 0

Views: 503

Answers (1)

Tom Dalton
Tom Dalton

Reputation: 6190

It's not clear why you're writing the row data with a file.write() call rather than using the csv writer's writerow method (which you are using for the header row. Using that method will take care of quoting / special character issues wrt. data containing quotes and commas.

Change:

f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

to:

for row in zip(title, category, url, prod):
    f_csv.writerow(row)

and your CSV should work as expected, assuming your CSV reader handles the quoted fields.

Upvotes: 2

Related Questions