carousallie
carousallie

Reputation: 865

Conditionally create list of lists

I don't know if I'm phrasing this in the best way, but essentially I'd like to create a list of lists based on different conditions.

I read in some XML data using ElementTree and after parsing it, I iterate through the tree and put all of the tags in a list called tags and their values in a list called vals.

Within my list of tags, there are a few sentence tags that I'd like to make the keys of a dictionary and the corresponding values append to a list and made the values.

My list of tags, their corresponding values, and sentence tags looks like below.

tags = ['irrel', 'TAG_ONE', 'TAG_ONE', 'TAG_TWO', 'TAG_ONE', 'TAG_TWO', 'irrel']

vals = ['not_rel', 1, 2, 5, 3, 6, 'not_rel']

sent_tags = ['TAG_ONE', 'TAG_TWO']

My ideal output is tags_dict = {'TAG_ONE': [1, 2, 3], 'TAG_TWO': [5, 6]} which I achieved using the code below.

sent_vals = list()

# Make a list of all TAG_ONE values and append list to sentence values list
tag_one = list()
tag_one_locs = [i for i, x in enumerate(tags) if x == 'TAG_ONE']
for t in tag_one_locs:
    tag_one.append(vals[t])
sent_vals.append(tag_one)

# make a list of all TAG_TWO values and append list to sentence values list
tag_two = list()
tag_two_locs = [i for i, x in enumerate(tags) if x == 'TAG_TWO']
for tt in tag_two_locs:
    tag_two.append(vals[tt])
sent_vals.append(tag_two)

tags_dict = dict(zip(sent_tags, sent_vals))

However, this is fairly ugly and just copying and pasting code a million times is impractical as my real data has about 70 sentence tags. I'm drawing a blank on how to simplify the code into some sort of list comprehension (or something else).

Upvotes: 2

Views: 407

Answers (2)

Brian
Brian

Reputation: 1604

a dict comprehension:

{sent_tag: [vals[ind] for ind, tag in enumerate(tags) if tags[ind] == sent_tag] for sent_tag in sent_tags}

Think of the code like this if the comprehension structure is confusing for you:

output = {}
for sent_tag in sent_tags:
    val_list = []

    for ind, tag in enumerate(tags):
        if tags[ind] == sent_tag:
            val_list.append(vals[ind])

    output.update({sent_tag: val_list})

Either way:

your output will be:

{'TAG_ONE': [1, 2, 3], 'TAG_TWO': [5, 6]}

Upvotes: 1

Jean-François Fabre
Jean-François Fabre

Reputation: 140168

Well, you can simplify that greatly using collections.defaultdict(list)

  • zip tags and values together
  • if tag match one of the interesting tags, add to dictionary

like this:

import collections

tags = ['irrel', 'TAG_ONE', 'TAG_ONE', 'TAG_TWO', 'TAG_ONE', 'TAG_TWO', 'irrel']

vals = ['not_rel', 1, 2, 5, 3, 6, 'not_rel']

sent_tags = {'TAG_ONE', 'TAG_TWO'}  # set is preferred when a lot of elements (faster "in" lookup)

tags_dict = collections.defaultdict(list)

for tag,val in zip(tags,vals):
    if tag in sent_tags:
        tags_dict[tag].append(val)

print(dict(tags_dict))  # convert to dict just to print

result:

{'TAG_TWO': [5, 6], 'TAG_ONE': [1, 2, 3]})

Upvotes: 2

Related Questions