Which Python data structure should I use?

Question

Could someone recommend the best data structure for the FinalResults described below:

I'm extracting various pieces of information from XML documents. Roughly, here's what I do: First use find_all to locate the text elements that contain a keyword. Then for each result:

get the parent tag for the text element
get an attribute of that parent, and
search the contents of the text element for additional text using regex.

This last search yields a result with up to 6 match groups.

This whole operation could end up returning something like this:

FinalResult 1: [parent, parent-attr, match.group(1), match.group(2) ... ,match.group(6)]

FinalResult 2: [parent, parent-attr, match.group(1), match.group(2) ... ,match.group(6)]

There is no maximum number of FinalResults that I might get. But on average I expect fewer than 10 from each XML doc. I plan to use each FinalResult for other processing but won't be changing or adding anything in the FinalResults. For example I might say: go back to the with attribute XYZ and get other data, then go get a file by the name of match.group(2) from elsewhere.

I'll probably be accessing each FinalResult only a few times. If it matters, some of the match.groups could be "None"

Here's an example. Assume this is FinalResult[0]: ['paragraph', '39871234', '42', '103', 'b', '1', None, None]

Paragraph would be the parent tag of the tag containing the keywords I found. 39871234 would be the id attribute of the paragraph tag 42 indicates a volume number 103 is a section in that volume b and 1 are subdivisions of that section

I would use 42/103/b/1 to extract info from another xml file. Paragraph and the id would be used in case I need to tell one keyword search result from another because the file will have multiple text elements. (Ex. Paragraph id=39871234 text [string containing keyword] )

My question is should I store all the FinalResults as a dictionary, a list, a tuple, or something else?

Which Python data structure should I use?

Answers (1)

Related Questions