Sachihiro
Sachihiro

Reputation: 1793

Dictionary with a query of sets in python

So i am trying to get the position of each word in a list, and print it in a dictionary that has the word for key and a set of integers where it belongs in the list.

list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
    for x in list_x[i].split():
        if x in my_dict:
            my_dict[x] += 1
        else:
            my_dict[x] = 1
print(my_dict)

This is the code i tried but this gives me the total number of how many time it appears in the list each word. What i am trying to get is this format:

{'this': {0, 1}, 'is': {0, 1}, 'the': {0, 1}, 'first': {0}, 'second': {1}}

As you can see this is the key and it appears once, in the "0" position and once in the "1" and .. Any idea how i might get to this point?

Upvotes: 3

Views: 567

Answers (3)

TallChuck
TallChuck

Reputation: 1972

Rather than using integers in your dict, you should use a set:

for i in range(len(list_x)):
    for x in list_x[i].split():
        if x in my_dict:
            my_dict[x].add(i)
        else:
            my_dict[x] = set([i])

Or, more briefly,

for i in range(len(list_x)):
    for x in list_x[i].split():
        my_dict.setdefault(x, set()).add(i)

Upvotes: 1

Brad Solomon
Brad Solomon

Reputation: 40918

You can also do this with defaultdict and enumerate:

from collections import defaultdict
list_x = ["this is the first",
          "this is the second",
          "third is this"]
pos = defaultdict(set)
for i, sublist in enumerate(list_x):
    for word in sublist.split():
        pos[word].add(i)

Output:

>>> from pprint import pprint
>>> pprint(dict(pos))
{'first': {0},
 'is': {0, 1, 2},
 'second': {1},
 'the': {0, 1},
 'third': {2},
 'this': {0, 1, 2}}

The purpose of enumerate is to provide the index (position) of each string within list_x. For each word encountered, the position of its sentence within list_x will be added to the set for its corresponding key in the result, pos.

Upvotes: 1

Felix
Felix

Reputation: 1905

Fixed two lines:

list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
    for x in list_x[i].split():
        if x in my_dict:
            my_dict[x].append(i)
        else:
            my_dict[x] = [i]
print(my_dict)

Returns:

{'this': [0, 1], 'is': [0, 1], 'the': [0, 1], 'first': [0], 'second': [1]}

Upvotes: 3

Related Questions