dazedconfused
dazedconfused

Reputation: 1342

Parsing data with pattern

I'm parsing some data with the pattern as follows:

tagA:
    titleA
    dataA1
    dataA2
    dataA3
    ...

tagB:
    titleB
    dataB1
    dataB2
    dataB3
    ...

tagC:
    titleC
    dataC1
    dataC2
    ...
...

These tags are stored in a list list_of_tags, if I iterate through the list, I can get all the tags; also, if iterating through the tags, I can get the title and the data associated with the title.

The tags in my data are pretty much something like <div>, so they are not useful to me; what I'm trying to do is to construct a dictionary which uses titles as keys and datas as a list of values.

The constructed dictionary would look like:

{
    titleA: [dataA1, dataA2, dataA3...],
    titleB: [dataB1, dataB2, dataB3...],
    ...
}

Notice every tag only contains one title and some datas, and title always comes before data.

So here are my working codes:

Method 1:

result = {}
for tag in list_of_tags:
   list_of_values = []
   for idx, elem in enumerate(tag):
       if not idx:
           key = elem
       else:
           construct_list_of_values()
   update_the_dictionary()

Actually, method 1 works fine and gives me my desired result; however, if I put that piece of codes in PyCharm, it warns me that "Local variable 'key' might be referenced before assignment" at the last line. Hence, I try another approach:

Method 2:

result = {tag[0]: tag[1:] for tag in list_of_tags}

Method 2 works fine if tags are lists, but I also want the code to work normally if tags are generators ('generator' object is not subscriptable will occur with method 2)

In order to work with generators, I come up with:

Method 3:

key_val_list = [(next(tag), list(tag)) for tag in list_of_tags]
result = dict(key_val_list)

Method 3 also works; but I cannot write this in dictionary comprehension ({next(tag): list(tag) for tag in list_of_tags} would give StopIteration exception because list(tag) will be evaluated first)

So, my question is, is there an elegant way for dealing with this pattern which could work no matter tags are lists or generators? (method 1 seems to work for both, but I don't know if I should ignore the warning PyCharms gives; the other two methods looks more concise, but one can only work on lists while the other can only work on generators)

Sorry for the long question, thanks for the patience!

Upvotes: 1

Views: 112

Answers (1)

Siyu Song
Siyu Song

Reputation: 917

I guess the reason why PyCharm is giving you a warning is that you are using key in update_the_dictionary, but key could be left unassigned if tag does not contain at least one element. You might have the knowledge that the title will always be in the list, but the static analyzer is not able to infer that from the context.

If you are using Python 3, you might want to try using PEP 3132 - Extended Iterable Unpacking. It should work for both lists and generators.

e.g.

title, *data = tag

Upvotes: 1

Related Questions