Address attribute in with xpath (dublin core)

Question

I have the following XML-response(s) via SRU in dublin core (of which there are several, this is one example):

Die EU im Einsatz gegen den Klimawandel : der EU-Emissionshandel - ein offenes System, das weltweit Innovationen fördert / [Europäische Kommission]
Europäische Kommission
[Luxemburg] : [Amt für Amtliche Veröff. der Europ. Gemeinschaften]
2005
ger
92-894-9187-6 geh.
992017882
360 Soziale Probleme, Sozialdienste, Versicherungen
330 Wirtschaft
20 S.
3

I am trying to address the element 992017882, but I seem to be unable to properly do this. Since I have several of these records and some have 2, some 1, some 3 or more dc:identifier versions, I am working with a function to get the content of the xml-tags I require and am parsing it to a dataframe afterwards. This works well for elements such as dc:title, but the moment I need to also address the attributes, I am at a loss. I have tried various things, but seem to have an issue with the fact that I need to adress two namespaces (?). The current function looks like this:

def parse_record(record):
    
    ns = {"dc": "http://purl.org/dc/elements/1.1/"}
    xml = ET.fromstring(unicodedata.normalize("NFC", str(record)))
    
    #idn
    idn = xml.xpath(".//dc:identifier[@xsi:type='dnb:IDN']", namespaces=ns)
    
    try:
        idn = idn.text
    except:
        idn = 'fail'
    
    # titel
    titel = xml.xpath('.//dc:title', namespaces=ns)
    
    try:
        titel = titel[0].text
        #titel = unicodedata.normalize("NFC", titel)
    except:
        titel = "unkown"
        
    meta_dict = {"idn":idn, "titel":titel}
    
    return meta_dict

I can run the function without any problems, but when I try to parse the response into a dataframe with the following code:

output = [parse_record(record) for record in records]
df = pd.DataFrame(output)
df

I get the error message: "XPathEvalError: Undefined namespace prefix"

Can anyone help?

Alexandra Dudkina · Accepted Answer

As pointed out in comments dictionary containing namespace declarations should include definition for xsi prefix as well:

ns = {
        "dc": "http://purl.org/dc/elements/1.1/", 
        # should be changed depending on the namespace
        "xsi": "http://www.w3.org/2001/XMLSchema-instance" 
}

Address attribute in with xpath (dublin core)

Answers (1)

Related Questions