A R.Torres
A R.Torres

Reputation: 33

Parse XML to CSV using Python nonetype error

I am trying to parse XML file to CSV. However, I am getting the following error. I have tested the logic with another simple XML and it seems to work. I have provided below my error, the XML file, the python code, and my desired output. Right now I have only added two of my columns. Have been looking at this for hours so another set of eyes would be much appreciated. Thank you!

Error:

name = member.find('CaseName').tag AttributeError: 'NoneType' object has no attribute 'tag'

XML File:

 <?xml version="1.0" encoding="UTF-8"?>
<Nuix version="7.2.2" architecture="amd64">
  <Export
    startTime="Sun Feb 25 22:07:07 2018 (America/Chicago)"
    endTime="Sun Feb 25 22:08:03 2018 (America/Chicago)"
    exportDuration="55s"
    processingDuration="55s">

    <ExportConfiguration>

      <LoadFiles>
      </LoadFiles>

      <MessageFormat>NATIVE</MessageFormat>
      <ExportDirectory>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix\OCR cache directory</ExportDirectory>
      <SeparateEmailAttachments>false</SeparateEmailAttachments>
      <RegenerateNatives>false</RegenerateNatives>
      <RegeneratePdfs>false</RegeneratePdfs>
      <FindTopLevelItems>false</FindTopLevelItems>
      <DescendantItems>false</DescendantItems>
      <ExportContainers>false</ExportContainers>
      <SortOrder>position</SortOrder>

      <CaseName>Brooklyn</CaseName>
      <CaseLocation>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix</CaseLocation>

      <TimeZone>America/Chicago</TimeZone>


      <Numbering>
        <Strategy>Document ID numbering</Strategy>
        <DocumentPagesInSameFolder>true</DocumentPagesInSameFolder>
        <FamilyDocumentsInSameFolder>false</FamilyDocumentsInSameFolder>
        <FirstItemNumber>DOC-000000001</FirstItemNumber>
      </Numbering>

      <Imaging>
        <ImagingProfile>Default</ImagingProfile>
      </Imaging>

      <Naming>
        <NativeNamingScheme>Page only</NativeNamingScheme>
        <PdfNamingScheme>Page only</PdfNamingScheme>
      </Naming>
      <OcrSettings>
          <Recognition>High Quality - Slow</Recognition>
          <Deskewed/>
          <UpdateTextStore append="true"/>
          <Rotation>Auto</Rotation>
          <Languages>English</Languages>
      </OcrSettings>

      <ResemblanceThreshold>0.85</ResemblanceThreshold>

    </ExportConfiguration>

    <ExportStatistics>
      <SelectedItems>4</SelectedItems>
      <ExcludedCount>0</ExcludedCount>
      <TotalItemsToExport>4</TotalItemsToExport>
      <FailedItems>0</FailedItems>
      <DocumentNumbers>
        <First></First>
        <Last></Last>
      </DocumentNumbers>
    </ExportStatistics>

    <ExportStageDetails>
      <Stage
        name="WORK_QUEUE"
        successfulItems="4"
        failedItems="0"
        duration="1s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="NATIVE"
        successfulItems="4"
        failedItems="0"
        duration="33s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="STORED_EMAIL_FIXUP"
        successfulItems="4"
        failedItems="0"
        duration="1s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="PDF"
        successfulItems="4"
        failedItems="0"
        duration="1s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="BINARY_STORE"
        successfulItems="0"
        failedItems="0"
        duration="0s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="OCR_INITIALISATION"
        successfulItems="4"
        failedItems="0"
        duration="0s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="OCR"
        successfulItems="4"
        failedItems="0"
        duration="17s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="POST_OCR"
        successfulItems="4"
        failedItems="0"
        duration="0s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
      <Stage
        name="TEXT_REPLACEMENT"
        successfulItems="4"
        failedItems="0"
        duration="1s">

        <SlipsheetItemDetails>
        </SlipsheetItemDetails>

        <FailedItemDetails>
        </FailedItemDetails>
      </Stage>
    </ExportStageDetails>

    <FileStatistics>
      <NativeFilesExported>3</NativeFilesExported>
      <NativeFilesFromStore>0</NativeFilesFromStore>
      <NativeFilesExportedInline>0</NativeFilesExportedInline>
      <NativeFilesExportedParallel>3</NativeFilesExportedParallel>
      <NativeFilesExportedParallelLocal>0</NativeFilesExportedParallelLocal>
      <NativeFilesWithInvalidTimes>0</NativeFilesWithInvalidTimes>
      <NativePlaceHolderFilesExported>0</NativePlaceHolderFilesExported>
      <NativeFilesRegenerated>0</NativeFilesRegenerated>
      <TextFilesExported>0</TextFilesExported>
      <TextPlaceHolderFilesExported>0</TextPlaceHolderFilesExported>
      <PdfFilesExported>0</PdfFilesExported>
      <PdfFilesStamped>0</PdfFilesStamped>
      <TiffFilesExported>0</TiffFilesExported>

      <PdfDetails>
        <PdfFilesFromStore>0</PdfFilesFromStore>
        <PdfFilesRegenerated>0</PdfFilesRegenerated>
        <PdfFilesExportedInline>0</PdfFilesExportedInline>
        <PdfFilesExportedParallel>0</PdfFilesExportedParallel>
        <PdfFilesExportedParallelLocal>0</PdfFilesExportedParallelLocal>
        <UserImportedPdfs>0</UserImportedPdfs>
        <PrintedPdfs>0</PrintedPdfs>
        <UnformattedTextPdfs>0</UnformattedTextPdfs>
        <ItemEncryptedPdfs>0</ItemEncryptedPdfs>
        <UnprintableItemPdfs>0</UnprintableItemPdfs>
      </PdfDetails>
    </FileStatistics>

    <PageCountStatistics>
      <PdfPages>0</PdfPages>
      <StampedPages>0</StampedPages>
      <FailedStampedPages>0</FailedStampedPages>
      <AveragePageCount>0.0</AveragePageCount>
    </PageCountStatistics>

    <ThroughputStatistics>
      <NativeDocRate>0.0857363321997085</NativeDocRate>
      <PdfDocRate>0.0</PdfDocRate>
      <StampedDocRate>0.0</StampedDocRate>
      <PdfPageRate>0.0</PdfPageRate>
      <StampingPageRate>0.0</StampingPageRate>
    </ThroughputStatistics>

    <MimeTypeStatistics>
      <MimeTypes>
        <MimeType name="application/pdf" count="4" />
      </MimeTypes>
    </MimeTypeStatistics>

  </Export>
</Nuix>

Python Code:

    import xml.etree.ElementTree as ET
    import csv

    tree = ET.parse('D:\\Users\\eferse\\Desktop\\XML_parsing\\summary-report.xml')
    root = tree.getroot()

    # open a file for writing

    Resident_data = open('D:\\Users\\eferse\\Desktop\\XML_parsing\\Nuix Export XML Parse_PythonOutput.csv', 'w')

    # create the csv writer object

    csvwriter = csv.writer(Resident_data)
    resident_head = []

    count = 0
    for member in root.findall('Export'):
        resident = []
        address_list = []
        if count == 0:
            name = member.find('CaseName').tag
            resident_head.append(CaseName)
            location= member.find('CaseLocation').tag
            resident_head.append(CaseLocation)

            csvwriter.writerow(resident_head)
            count = count + 1

        name = member.find('CaseName').text
        resident.append(CaseName)
        location= member.find('CaseLocation').text
        resident.append(CaseLocation)


        csvwriter.writerow(resident)
    Resident_data.close()

Desired Output: Output

Upvotes: 1

Views: 115

Answers (1)

johnashu
johnashu

Reputation: 2211

I have used indexing to access the child elements in question. Sometimes this is easier to do when you know where the information is.

You can check this using the following

for child in root[0]:
    print(child.tag, child.attrib)

and you can navigate further by continuing the index as far as you like root[0][0][1] etc etc

You have to remember that the index is the parent and you are looking for the children. in your case root is Nuix which will return the children in this instance Export

root[0] is 'Export' which find will search the children and return what you want which is ExportConfiguration and inside here is what you are looking for CaseName and CaseLocation..

if you do

for child in root[0][0]:
    print(child.tag, child.attrib)

This will print the tags of CaseName etc but you will not be able to use find at this level. You will be searching inside CaseName for CaseName.

Once you have the parent you are able to find the children easier.

This code works.

I have taken the empty lists out of the loop.

I have also changed the append values as they did not have a variable, only a string name... I have also indented some appends as they were outside of the loop.

I have left the print statements in so you can see what is going on.

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('summary-report.xml')
root = tree.getroot()

Resident_data = open('Parse_PythonOutput.csv', 'a')

    # create the csv writer object

csvwriter = csv.writer(Resident_data)
resident_head = []
resident = []
address_list = []

count = 0
for member in root[0]:
    if count == 0:

        name = member.find('CaseName').tag
        print(name)
        resident_head.append(name)

        location = member.find('CaseLocation').tag
        print(location)
        resident_head.append(location)

        csvwriter.writerow(resident_head)
        count = count + 1

        name_text = member.find('CaseName').text
        print(name_text)
        resident.append(name_text)

        text_location = member.find('CaseLocation').text
        print(text_location)
        resident.append(text_location)

        print(resident)

csvwriter.writerow(resident)

Resident_data.close()

The CSV data file looks like this:

CaseName,CaseLocation
Brooklyn,C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix

Upvotes: 1

Related Questions