Reputation: 33
I am trying to parse XML file to CSV. However, I am getting the following error. I have tested the logic with another simple XML and it seems to work. I have provided below my error, the XML file, the python code, and my desired output. Right now I have only added two of my columns. Have been looking at this for hours so another set of eyes would be much appreciated. Thank you!
Error:
name = member.find('CaseName').tag AttributeError: 'NoneType' object has no attribute 'tag'
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<Nuix version="7.2.2" architecture="amd64">
<Export
startTime="Sun Feb 25 22:07:07 2018 (America/Chicago)"
endTime="Sun Feb 25 22:08:03 2018 (America/Chicago)"
exportDuration="55s"
processingDuration="55s">
<ExportConfiguration>
<LoadFiles>
</LoadFiles>
<MessageFormat>NATIVE</MessageFormat>
<ExportDirectory>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix\OCR cache directory</ExportDirectory>
<SeparateEmailAttachments>false</SeparateEmailAttachments>
<RegenerateNatives>false</RegenerateNatives>
<RegeneratePdfs>false</RegeneratePdfs>
<FindTopLevelItems>false</FindTopLevelItems>
<DescendantItems>false</DescendantItems>
<ExportContainers>false</ExportContainers>
<SortOrder>position</SortOrder>
<CaseName>Brooklyn</CaseName>
<CaseLocation>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix</CaseLocation>
<TimeZone>America/Chicago</TimeZone>
<Numbering>
<Strategy>Document ID numbering</Strategy>
<DocumentPagesInSameFolder>true</DocumentPagesInSameFolder>
<FamilyDocumentsInSameFolder>false</FamilyDocumentsInSameFolder>
<FirstItemNumber>DOC-000000001</FirstItemNumber>
</Numbering>
<Imaging>
<ImagingProfile>Default</ImagingProfile>
</Imaging>
<Naming>
<NativeNamingScheme>Page only</NativeNamingScheme>
<PdfNamingScheme>Page only</PdfNamingScheme>
</Naming>
<OcrSettings>
<Recognition>High Quality - Slow</Recognition>
<Deskewed/>
<UpdateTextStore append="true"/>
<Rotation>Auto</Rotation>
<Languages>English</Languages>
</OcrSettings>
<ResemblanceThreshold>0.85</ResemblanceThreshold>
</ExportConfiguration>
<ExportStatistics>
<SelectedItems>4</SelectedItems>
<ExcludedCount>0</ExcludedCount>
<TotalItemsToExport>4</TotalItemsToExport>
<FailedItems>0</FailedItems>
<DocumentNumbers>
<First></First>
<Last></Last>
</DocumentNumbers>
</ExportStatistics>
<ExportStageDetails>
<Stage
name="WORK_QUEUE"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="NATIVE"
successfulItems="4"
failedItems="0"
duration="33s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="STORED_EMAIL_FIXUP"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="PDF"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="BINARY_STORE"
successfulItems="0"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="OCR_INITIALISATION"
successfulItems="4"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="OCR"
successfulItems="4"
failedItems="0"
duration="17s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="POST_OCR"
successfulItems="4"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="TEXT_REPLACEMENT"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
</ExportStageDetails>
<FileStatistics>
<NativeFilesExported>3</NativeFilesExported>
<NativeFilesFromStore>0</NativeFilesFromStore>
<NativeFilesExportedInline>0</NativeFilesExportedInline>
<NativeFilesExportedParallel>3</NativeFilesExportedParallel>
<NativeFilesExportedParallelLocal>0</NativeFilesExportedParallelLocal>
<NativeFilesWithInvalidTimes>0</NativeFilesWithInvalidTimes>
<NativePlaceHolderFilesExported>0</NativePlaceHolderFilesExported>
<NativeFilesRegenerated>0</NativeFilesRegenerated>
<TextFilesExported>0</TextFilesExported>
<TextPlaceHolderFilesExported>0</TextPlaceHolderFilesExported>
<PdfFilesExported>0</PdfFilesExported>
<PdfFilesStamped>0</PdfFilesStamped>
<TiffFilesExported>0</TiffFilesExported>
<PdfDetails>
<PdfFilesFromStore>0</PdfFilesFromStore>
<PdfFilesRegenerated>0</PdfFilesRegenerated>
<PdfFilesExportedInline>0</PdfFilesExportedInline>
<PdfFilesExportedParallel>0</PdfFilesExportedParallel>
<PdfFilesExportedParallelLocal>0</PdfFilesExportedParallelLocal>
<UserImportedPdfs>0</UserImportedPdfs>
<PrintedPdfs>0</PrintedPdfs>
<UnformattedTextPdfs>0</UnformattedTextPdfs>
<ItemEncryptedPdfs>0</ItemEncryptedPdfs>
<UnprintableItemPdfs>0</UnprintableItemPdfs>
</PdfDetails>
</FileStatistics>
<PageCountStatistics>
<PdfPages>0</PdfPages>
<StampedPages>0</StampedPages>
<FailedStampedPages>0</FailedStampedPages>
<AveragePageCount>0.0</AveragePageCount>
</PageCountStatistics>
<ThroughputStatistics>
<NativeDocRate>0.0857363321997085</NativeDocRate>
<PdfDocRate>0.0</PdfDocRate>
<StampedDocRate>0.0</StampedDocRate>
<PdfPageRate>0.0</PdfPageRate>
<StampingPageRate>0.0</StampingPageRate>
</ThroughputStatistics>
<MimeTypeStatistics>
<MimeTypes>
<MimeType name="application/pdf" count="4" />
</MimeTypes>
</MimeTypeStatistics>
</Export>
</Nuix>
Python Code:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('D:\\Users\\eferse\\Desktop\\XML_parsing\\summary-report.xml')
root = tree.getroot()
# open a file for writing
Resident_data = open('D:\\Users\\eferse\\Desktop\\XML_parsing\\Nuix Export XML Parse_PythonOutput.csv', 'w')
# create the csv writer object
csvwriter = csv.writer(Resident_data)
resident_head = []
count = 0
for member in root.findall('Export'):
resident = []
address_list = []
if count == 0:
name = member.find('CaseName').tag
resident_head.append(CaseName)
location= member.find('CaseLocation').tag
resident_head.append(CaseLocation)
csvwriter.writerow(resident_head)
count = count + 1
name = member.find('CaseName').text
resident.append(CaseName)
location= member.find('CaseLocation').text
resident.append(CaseLocation)
csvwriter.writerow(resident)
Resident_data.close()
Desired Output: Output
Upvotes: 1
Views: 115
Reputation: 2211
I have used indexing to access the child elements in question. Sometimes this is easier to do when you know where the information is.
You can check this using the following
for child in root[0]:
print(child.tag, child.attrib)
and you can navigate further by continuing the index as far as you like root[0][0][1]
etc etc
You have to remember that the index is the parent and you are looking for the children. in your case root is Nuix
which will return the children in this instance Export
root[0]
is 'Export' which find
will search the children and return what you want which is ExportConfiguration
and inside here is what you are looking for CaseName
and CaseLocation
..
if you do
for child in root[0][0]:
print(child.tag, child.attrib)
This will print the tags of CaseName
etc but you will not be able to use find at this level. You will be searching inside CaseName
for CaseName
.
Once you have the parent you are able to find the children easier.
This code works.
I have taken the empty lists out of the loop.
I have also changed the append
values as they did not have a variable, only a string name... I have also indented some appends as they were outside of the loop.
I have left the print
statements in so you can see what is going on.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('summary-report.xml')
root = tree.getroot()
Resident_data = open('Parse_PythonOutput.csv', 'a')
# create the csv writer object
csvwriter = csv.writer(Resident_data)
resident_head = []
resident = []
address_list = []
count = 0
for member in root[0]:
if count == 0:
name = member.find('CaseName').tag
print(name)
resident_head.append(name)
location = member.find('CaseLocation').tag
print(location)
resident_head.append(location)
csvwriter.writerow(resident_head)
count = count + 1
name_text = member.find('CaseName').text
print(name_text)
resident.append(name_text)
text_location = member.find('CaseLocation').text
print(text_location)
resident.append(text_location)
print(resident)
csvwriter.writerow(resident)
Resident_data.close()
The CSV data file looks like this:
CaseName,CaseLocation
Brooklyn,C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix
Upvotes: 1