Reputation: 81
I am new in programming and I am trying to implement a function that creates a dictionary that maps each term to its inverted list. So, given the collection:
collection = ["apple orange milk" , "bread milk meat apple" , "apple orange"]
I want that for each element in the collection to get the index of the string on which it is. I am trying to get the following result:
inv_ls = { "apple": [0,1,2], "orange": [0,2], "milk":[0,1], "bread":[1], "bread":[1]}
Upvotes: 0
Views: 69
Reputation: 662
This could also be done using list, set and dict comprehension:
{ # generate dict
word: [
string.split(" ").index(word) # get index
for string in collection
if word in string.split(" ") # if word in string
]
for word in { # generate set with all words
word for string in collection for word in string.split(" ")
}
}
You should consider using a fallback value for that all lists have the same length and the index can be used to find the words in the original list.
{ # generate dict
word: [
string.split(" ").index(word) # get index
if word in string.split(" ") # if word in string
else None # if not in string use a fallback value instead
for string in collection
]
for word in { # generate set with all words
word for string in collection for word in string.split(" ")
}
}
this would produce:
{'bread': [None, 0, None], 'apple': [0, 3, 0], 'milk': [2, 1, None], 'orange': [1, None, 1], 'meat': [None, 2, None]}
Upvotes: 0
Reputation: 607
You got the error because terms list is:
[['apple', 'orange', 'milk'], ['bread', 'milk', 'meat', 'apple'], ['apple', 'orange']]
which is a list if lists.
So, when you try to find 'apple' in that list compiler gives that ValueError. Correct code should be like:
def create_index (C):
index = {}
terms = []
for element in C:
terms.append(element.split())
for element in terms:
for term in element:
if term in element:
index.setdefault(term,[])
index[term].append(terms.index(element))
return index
Upvotes: 0
Reputation: 120399
Split each string in a list of words, then for each word record the index.
If word is not yet appeared, the setdefault
method creates a new entry in the dict and set an empty list ([]
).
collection = ["apple orange milk" , "bread milk meat apple" , "apple orange"]
inv_ls = {}
for idx, lst in enumerate([s.split() for s in collection]):
for item in lst:
inv_ls.setdefault(item, []).append(idx)
>>> inv_ls
{'apple': [0, 1, 2],
'orange': [0, 2],
'milk': [0, 1],
'bread': [1],
'meat': [1]}
Upvotes: 1
Reputation: 427
You can update your dictionary directly in loop of element
def create_index (C):
index = {}
terms = []
for element in C:
terms.append(element.split())
for i, element in enumerate(terms):
for term in element:
if term not in index:
index[term] = [i]
else:
index[term].append(i)
return index
C = ["apple orange milk" , "bread milk meat apple" , "apple orange"]
index = create_index(C)
print(index)
Upvotes: 1