Mehul
Mehul

Reputation: 7

return a dictionary with dictinct words as keys and it's position of occurence in the text as value

Say I have a string

"I say what I mean. I mean what I say. i do."

I am trying to write a function which will return a dictionary that will look like the following:

{'i":[0,1,2],'say':[0,1],'what':[0,1],'mean':[0,1],'do':[2]}

What it's doing is that it's taking each character, only once, into dictionary as a key and displaying what sentence it appears in as the value relating that key. So for eg, the word "mean" appears in the first [0] and second 1 sentences. On the other hand, the word "do" only appears in the third sentence, hence the:

'do':[2]

in the output.

This is the code I have come up with after changing everything I can think of around to get a list for values to pair with each keys.

def wordsD(text):
#split each sentences at '.'
myList = text.lower().split('.')
#declare empty dictionary for the counter
myDict = {}
counterList = []
for sentence in myList:
    words = sentence.split()
    for word in words:
        index = words.index(word)
        counterList.append(index)
        if word not in myDict:
            myDict[word] = list()
            myDict[word].append(index)
        else:
            myDict[word]= list()
            myDict[word].append(index)


return myDict


text=('I say what I mean. I mean what I say. i do.')
print(wordsD(text))

And this is the output I get:

{'mean': [1], 'what': [2], 'say': [4], 'i': [0], 'do': [1]}

But now I am not sure if I understood the question wrong or I am missing something in my code. Any help would be great!! Even a pointer in the right direction would help me out since I am coming up blank even when I try to write a psudo code for this problem. Thank you!

I looked at Counting vowels and Turning a text file with words and their positions into a sentence but I still cant figure out how to make the list as values for each keys.

Upvotes: 1

Views: 8528

Answers (5)

Abhishek Jha
Abhishek Jha

Reputation: 985

Your code is going wrong while you are assigning index. Currently your word structure in each iteration goes like this for

Eg:

for first iteration

Words=[I,say,what,i,mean]

and when you try to find the index of word it return the index in that sentence and not the sentence number.

Instead you can keep a counter for loop at sentence level and you do not need to find index just assign that counter value to each word found in a sentence.

index=-1
for sentence in myList:
    words = sentence.split()
    index++
    for word in words:
        counterList.append(index)
        if word not in myDict:
            myDict[word] = list()
            myDict[word].append(index)
        else:
            myDict[word]= list()
            myDict[word].append(index)

Upvotes: -1

Dan Oberlam
Dan Oberlam

Reputation: 2496

There are two issues with your code. First, you were making a new list in both the if and the else statements instead of appending to the existing list.

Changing

else:
    myDict[word] = list()
    myDict[word].append(index)

to

else:
    myDict[word].append(index) 

solves that issue.

Secondly, your code is tracking the index within a given sentence (ie the word position) and not the sentences it is present in (which your question indicates you want). The following code should fix that issue

def wordsD(text):
    myList = text.lower().split('.')
    myDict = {}

    for i in range(len(myList)):
        words = myList[i].split()
        for word in words:
            if word not in myDict:
                myDict[word] = [i]
            else:
                if i not in myDict[word]:
                    myDict[word].append(i)

    return myDict

Upvotes: 1

DOSHI
DOSHI

Reputation: 442

This will definitely help you.

string = "I say what I mean. I mean what I say. i do."

DICT = {}

LIST  =  string.split('.')

WORDS = list(set(string.lower().replace('.',"").split()))

LIST = [set((x.lower()).split()) for x in LIST]

for i in range(len(LIST)):
    for item in WORDS:
        if item in LIST[i]:
            DICT.setdefault(item, []).append(i)
print(DICT)

OUTPUT

{'i': [0, 1, 2], 'do': [2], 'say': [0, 1], 'what': [0, 1], 'mean': [0, 1]}

Upvotes: 3

sundar nataraj
sundar nataraj

Reputation: 8702

def wordsD(text):
#split each sentences at '.'
    myList = text.lower().split('.')
    #declare empty dictionary for the counter
    myDict = {}
    counterList = []

# use the enumerate here
    for senten_no,sentence in enumerate(myList): 
        words = sentence.split()
        for word in words:
            index = words.index(word)
            counterList.append(index)
            if word not in myDict:
                myDict[word] = list()
                myDict[word].append(senten_no)
            else:
                if not senten_no in myDict[word]:
                    myDict[word].append(senten_no)


    return myDict


    text=('I say what I mean. I mean what I say. i do.')
print(wordsD(text))

For every time Your appending index of the word rather than index of Sentence. Use Enumerate for the sentence . which keep index so while appending append it with sentence Index

Upvotes: 1

erlc
erlc

Reputation: 680

index now represents the position of the word in the sentence, not the index of the sentence. Try this:

for index, sentence in enumerate(myList):
 ... 

Upvotes: 2

Related Questions