Reputation: 1793
So i am trying to get the position of each word in a list, and print it in a dictionary that has the word for key and a set of integers where it belongs in the list.
list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x] += 1
else:
my_dict[x] = 1
print(my_dict)
This is the code i tried but this gives me the total number of how many time it appears in the list each word. What i am trying to get is this format:
{'this': {0, 1}, 'is': {0, 1}, 'the': {0, 1}, 'first': {0}, 'second': {1}}
As you can see this is the key and it appears once, in the "0" position and once in the "1" and .. Any idea how i might get to this point?
Upvotes: 3
Views: 567
Reputation: 1972
Rather than using integers in your dict, you should use a set:
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x].add(i)
else:
my_dict[x] = set([i])
Or, more briefly,
for i in range(len(list_x)):
for x in list_x[i].split():
my_dict.setdefault(x, set()).add(i)
Upvotes: 1
Reputation: 40918
You can also do this with defaultdict
and enumerate
:
from collections import defaultdict
list_x = ["this is the first",
"this is the second",
"third is this"]
pos = defaultdict(set)
for i, sublist in enumerate(list_x):
for word in sublist.split():
pos[word].add(i)
Output:
>>> from pprint import pprint
>>> pprint(dict(pos))
{'first': {0},
'is': {0, 1, 2},
'second': {1},
'the': {0, 1},
'third': {2},
'this': {0, 1, 2}}
The purpose of enumerate is to provide the index (position) of each string within list_x
. For each word encountered, the position of its sentence within list_x
will be added to the set for its corresponding key in the result, pos
.
Upvotes: 1
Reputation: 1905
Fixed two lines:
list_x = ["this is the first", "this is the second"]
my_dict = {}
for i in range(len(list_x)):
for x in list_x[i].split():
if x in my_dict:
my_dict[x].append(i)
else:
my_dict[x] = [i]
print(my_dict)
Returns:
{'this': [0, 1], 'is': [0, 1], 'the': [0, 1], 'first': [0], 'second': [1]}
Upvotes: 3