Felisep
Felisep

Reputation: 23

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

I'm trying to parse a directory with a collection of xml files from RSS feeds. I have a similar code for another directory working fine, so I can't figure out the problem. I want to return the items so I can write them to a CSV file. The error I'm getting is:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os
import xml.etree.ElementTree as ET
import csv

def baitem():
basepath = "../data_copy/bergens_avisen"

table = []

for fname in os.listdir(basepath):
    if fname != "last_feed.xml":
        files = ET.parse(os.path.join(basepath, fname))
        root = files.getroot()
        items = root.find("channel").findall("item")
        #print(items)
    for item in items:
        date = item.find("pubDate").text
        title = item.find("title").text
        description = item.find("description").text
        link = item.find("link").text
        table.append((date, title, description, link))
return table

I tested with print(items) and it returns all the objects. Can it be how the XML files are written?

Upvotes: 0

Views: 1118

Answers (1)

Felisep
Felisep

Reputation: 23

Asked a friend and said to test with a try except statement. Found a .DS_Store file, which only applies to Mac computers. I'm providing the solution for those who might experience the same problem in the future.

def baitem():

basepath = "../data_copy/bergens_avisen"

table = []

for fname in os.listdir(basepath):
    try:
        if fname != "last_feed.xml" and fname != ".DS_Store":
            files = ET.parse(os.path.join(basepath, fname))
            root = files.getroot()
            items = root.find("channel").findall("item")
            for item in items:
                date = item.find("pubDate").text
                title = item.find("title").text
                description = item.find("description").text
                link = item.find("link").text
                table.append((date, title, description, link))
    except Exception as e:
        print(fname, e)
return table

Upvotes: 1

Related Questions