Ali
Ali

Reputation: 1

Make a dictionary consisting of index of words in python 3

Input: a list of strings as ['who are they','are you there?','Yes! you be there']

Output: a dictionary that maps each word in any string to the set consisting of the ids for all strings containing the word.

output = {'who':[1], 'are':[1,2], 'they':[1], 'you':[2,3], 'there':[2], 'Yes':[3], 'be':[3]}

I am stuck please help, i am unable to make a method or procedure that performs this function.

Upvotes: 0

Views: 809

Answers (3)

John Doe
John Doe

Reputation: 423

How about this fun solution:

import string
a = ['who are they','are you there?','Yes! you be there']
x ={}
for word in ' '.join(a).translate(None,string.punctuation).lower().split():
    try:x[word]+=1
    except:x[word]=1
print x
  • join() the list of strings to form a string, since you don't care how the words are organized
  • translate() to remove punctuations
  • lower() all characters to lower case so you don't treat "Yes" and "yes" differently
  • split() the string into words
  • try, except and code-golf your way around the longer if statement

Upvotes: 0

Pawel Miech
Pawel Miech

Reputation: 7822

I would solve this problem like this:

def toDict(l):
    ids, output,i = {}, {},1
    for sentence in l:
        ids[sentence] = i
        i += 1
    for sentence in l:
        words = sentence.split(" ")
        for word in words:
            if word in output:
                output[word].append(ids[sentence])
            else:
                output[word] = []
                output[word].append(ids[sentence])
    return output

which returns:

 {'be': [3], 'there': [3], 'who': [1], 'Yes!': [3], 'there?': [2], 'are': [1, 2], 'they': [1], 'you': [2, 3]}

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121486

Use a collections.defaultdict object to gather your ids, and enumerate() to generate them:

from collections import defaultdict

output = defaultdict(list)

for index, sentence in enumerate(inputlist):
    for word in sentence.lower().split():
         output[word.strip('!?. ')].append(index) 

Note that I lowercase the sentence and strip any leftover punctuation.

Result:

defaultdict(<class 'list'>, {'are': [0, 1], 'they': [0], 'be': [2], 'who': [0], 'yes': [2], 'there': [1, 2], 'you': [1, 2]})

This uses 0-based indexing (like everything in Python). If you have to count from 1, tell enumerate() to start counting from there:

for index, sentence in enumerate(inputlist, 1):

Upvotes: 7

Related Questions