Reputation: 1
Input: a list of strings as ['who are they','are you there?','Yes! you be there']
Output: a dictionary that maps each word in any string to the set consisting of the ids for all strings containing the word.
output = {'who':[1], 'are':[1,2], 'they':[1], 'you':[2,3], 'there':[2], 'Yes':[3], 'be':[3]}
I am stuck please help, i am unable to make a method or procedure that performs this function.
Upvotes: 0
Views: 809
Reputation: 423
How about this fun solution:
import string
a = ['who are they','are you there?','Yes! you be there']
x ={}
for word in ' '.join(a).translate(None,string.punctuation).lower().split():
try:x[word]+=1
except:x[word]=1
print x
Upvotes: 0
Reputation: 7822
I would solve this problem like this:
def toDict(l):
ids, output,i = {}, {},1
for sentence in l:
ids[sentence] = i
i += 1
for sentence in l:
words = sentence.split(" ")
for word in words:
if word in output:
output[word].append(ids[sentence])
else:
output[word] = []
output[word].append(ids[sentence])
return output
which returns:
{'be': [3], 'there': [3], 'who': [1], 'Yes!': [3], 'there?': [2], 'are': [1, 2], 'they': [1], 'you': [2, 3]}
Upvotes: 1
Reputation: 1121486
Use a collections.defaultdict
object to gather your ids, and enumerate()
to generate them:
from collections import defaultdict
output = defaultdict(list)
for index, sentence in enumerate(inputlist):
for word in sentence.lower().split():
output[word.strip('!?. ')].append(index)
Note that I lowercase the sentence and strip any leftover punctuation.
Result:
defaultdict(<class 'list'>, {'are': [0, 1], 'they': [0], 'be': [2], 'who': [0], 'yes': [2], 'there': [1, 2], 'you': [1, 2]})
This uses 0-based indexing (like everything in Python). If you have to count from 1, tell enumerate()
to start counting from there:
for index, sentence in enumerate(inputlist, 1):
Upvotes: 7