Reputation: 7
Say I have a string
"I say what I mean. I mean what I say. i do."
I am trying to write a function which will return a dictionary that will look like the following:
{'i":[0,1,2],'say':[0,1],'what':[0,1],'mean':[0,1],'do':[2]}
What it's doing is that it's taking each character, only once, into dictionary as a key and displaying what sentence it appears in as the value relating that key. So for eg, the word "mean" appears in the first [0] and second 1 sentences. On the other hand, the word "do" only appears in the third sentence, hence the:
'do':[2]
in the output.
This is the code I have come up with after changing everything I can think of around to get a list for values to pair with each keys.
def wordsD(text):
#split each sentences at '.'
myList = text.lower().split('.')
#declare empty dictionary for the counter
myDict = {}
counterList = []
for sentence in myList:
words = sentence.split()
for word in words:
index = words.index(word)
counterList.append(index)
if word not in myDict:
myDict[word] = list()
myDict[word].append(index)
else:
myDict[word]= list()
myDict[word].append(index)
return myDict
text=('I say what I mean. I mean what I say. i do.')
print(wordsD(text))
And this is the output I get:
{'mean': [1], 'what': [2], 'say': [4], 'i': [0], 'do': [1]}
But now I am not sure if I understood the question wrong or I am missing something in my code. Any help would be great!! Even a pointer in the right direction would help me out since I am coming up blank even when I try to write a psudo code for this problem. Thank you!
I looked at Counting vowels and Turning a text file with words and their positions into a sentence but I still cant figure out how to make the list as values for each keys.
Upvotes: 1
Views: 8528
Reputation: 985
Your code is going wrong while you are assigning index
. Currently your word structure in each iteration goes like this
for
Eg:
for first iteration
Words=[I,say,what,i,mean]
and when you try to find the index of word it return the index in that sentence and not the sentence number.
Instead you can keep a counter for loop at sentence level and you do not need to find index just assign that counter value to each word found in a sentence.
index=-1
for sentence in myList:
words = sentence.split()
index++
for word in words:
counterList.append(index)
if word not in myDict:
myDict[word] = list()
myDict[word].append(index)
else:
myDict[word]= list()
myDict[word].append(index)
Upvotes: -1
Reputation: 2496
There are two issues with your code. First, you were making a new list in both the if
and the else
statements instead of appending to the existing list.
Changing
else:
myDict[word] = list()
myDict[word].append(index)
to
else:
myDict[word].append(index)
solves that issue.
Secondly, your code is tracking the index within a given sentence (ie the word position) and not the sentences it is present in (which your question indicates you want). The following code should fix that issue
def wordsD(text):
myList = text.lower().split('.')
myDict = {}
for i in range(len(myList)):
words = myList[i].split()
for word in words:
if word not in myDict:
myDict[word] = [i]
else:
if i not in myDict[word]:
myDict[word].append(i)
return myDict
Upvotes: 1
Reputation: 442
This will definitely help you.
string = "I say what I mean. I mean what I say. i do."
DICT = {}
LIST = string.split('.')
WORDS = list(set(string.lower().replace('.',"").split()))
LIST = [set((x.lower()).split()) for x in LIST]
for i in range(len(LIST)):
for item in WORDS:
if item in LIST[i]:
DICT.setdefault(item, []).append(i)
print(DICT)
OUTPUT
{'i': [0, 1, 2], 'do': [2], 'say': [0, 1], 'what': [0, 1], 'mean': [0, 1]}
Upvotes: 3
Reputation: 8702
def wordsD(text):
#split each sentences at '.'
myList = text.lower().split('.')
#declare empty dictionary for the counter
myDict = {}
counterList = []
# use the enumerate here
for senten_no,sentence in enumerate(myList):
words = sentence.split()
for word in words:
index = words.index(word)
counterList.append(index)
if word not in myDict:
myDict[word] = list()
myDict[word].append(senten_no)
else:
if not senten_no in myDict[word]:
myDict[word].append(senten_no)
return myDict
text=('I say what I mean. I mean what I say. i do.')
print(wordsD(text))
For every time Your appending index of the word rather than index of Sentence. Use Enumerate for the sentence . which keep index so while appending append it with sentence Index
Upvotes: 1
Reputation: 680
index
now represents the position of the word in the sentence, not the index of the sentence. Try this:
for index, sentence in enumerate(myList):
...
Upvotes: 2