Reputation: 89
I'm new to python and I'm trying to read a text file into two dictionaries with values as a list.
The file contains the following:
term1 doc1 doc3 doc4
term2 doc5 doc1
term3 doc6 doc2
I'm trying to create two dictionaries from the same file, one that will have the terms as keys and values as docs and the other will be the opposite.
inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
for line in file:
items = line.split()
term, doc = items[0], items[1:]
for doc in items[1:]
inverted_index[term] = [doc]
forward_index[doc] = [term]
print(inverted_index)
print(forward_index)
with what I've done so far I'm getting the following output:
{'term2': ['doc1'], 'term1': ['doc4'], 'term3': ['doc2']}
{'doc3': ['term1'], 'doc6': ['term3'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc1': ['term2'], 'doc2': ['term3']}
but this is the output I'm looking for:
{'term1': ['doc1','doc3','doc4'], 'term2': ['doc5','doc1'], 'term3': ['doc6','doc2']}
{'doc1': ['term1','term2'], 'doc3': ['term1'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc6': ['term3'], 'doc2': ['term3']}
Please help me to fix this!
Upvotes: 0
Views: 264
Reputation: 5759
As 'coder' suggested, I would also use a defaultdict
here. Because the doc
s may appear more than once across multiple term
s, you should use a set
to avoid duplicate items:
from collections import defaultdict
inverted_index = defaultdict(set)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
for line in file:
items = line.split()
term, docs = items[0], items[1:]
inverted_index[term].update(docs)
for doc in docs:
forward_index[doc].append(term)
print(inverted_index)
print(forward_index)
(And as Barmar suggests, you only need to assign the forward_index
once in the outer loop.)
Upvotes: 1
Reputation: 782489
You don't need to add to inverted_index
in the inner loop, that's just done once for each row.
In the inner loop, you need to append to the dictionary entry if it already exists, not overwrite it.
inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
for line in file:
items = line.split()
term, doc = items[0], items[1:]
inverted_index[term] = doc
for doc in items[1:]
forward_index.setdefault(doc, []).append(term)
print(inverted_index)
print(forward_index)
Upvotes: 3
Reputation: 12531
inverted_index
should not in the inner for
, and for forward_index
, you replaced the previous value in each inner for
. Try the following code:
inverted_index = {}
forward_index = {}
with open('test') as f:
for line in f:
items = line.split()
term, docs = items[0], items[1:]
inverted_index[term] = docs
for doc in docs:
terms = forward_index.get(doc, [])
terms.append(term)
forward_index[doc] = terms
print(inverted_index)
print(forward_index)
Upvotes: 1
Reputation: 12992
You could use defaultdict(list)
from collections
module - cause in your solution every time the key gets updated:
#!/usr/bin/env python
from collections import defaultdict
inverted_index = defaultdict(list)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
for line in file:
items = line.split()
term, doc = items[0], items[1:]
for doc in items[1:]:
inverted_index[term].append(doc)
forward_index[doc].append(term)
print(inverted_index)
print(forward_index)
Upvotes: 1