Mow1993
Mow1993

Reputation: 89

reading into two dictionaries from same file (python)

I'm new to python and I'm trying to read a text file into two dictionaries with values as a list.

The file contains the following:

term1  doc1 doc3 doc4
term2  doc5 doc1
term3  doc6 doc2

I'm trying to create two dictionaries from the same file, one that will have the terms as keys and values as docs and the other will be the opposite.

inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        for doc in items[1:]
            inverted_index[term] = [doc]
            forward_index[doc] = [term]

print(inverted_index)
print(forward_index)

with what I've done so far I'm getting the following output:

{'term2': ['doc1'], 'term1': ['doc4'], 'term3': ['doc2']}
{'doc3': ['term1'], 'doc6': ['term3'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc1': ['term2'], 'doc2': ['term3']}

but this is the output I'm looking for:

{'term1': ['doc1','doc3','doc4'], 'term2': ['doc5','doc1'], 'term3': ['doc6','doc2']}
{'doc1': ['term1','term2'], 'doc3': ['term1'], 'doc4': ['term1'], 'doc5': ['term2'], 'doc6': ['term3'], 'doc2': ['term3']}

Please help me to fix this!

Upvotes: 0

Views: 264

Answers (4)

Julien
Julien

Reputation: 5759

As 'coder' suggested, I would also use a defaultdict here. Because the docs may appear more than once across multiple terms, you should use a set to avoid duplicate items:

from collections import defaultdict

inverted_index = defaultdict(set)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, docs = items[0], items[1:]
        inverted_index[term].update(docs)
        for doc in docs:
            forward_index[doc].append(term)

print(inverted_index)
print(forward_index)

(And as Barmar suggests, you only need to assign the forward_index once in the outer loop.)

Upvotes: 1

Barmar
Barmar

Reputation: 782489

You don't need to add to inverted_index in the inner loop, that's just done once for each row.

In the inner loop, you need to append to the dictionary entry if it already exists, not overwrite it.

inverted_index = {}
forward_index = {}
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        inverted_index[term] = doc
        for doc in items[1:]
            forward_index.setdefault(doc, []).append(term)

print(inverted_index)
print(forward_index)

Upvotes: 3

shizhz
shizhz

Reputation: 12531

inverted_index should not in the inner for, and for forward_index, you replaced the previous value in each inner for. Try the following code:

inverted_index = {}
forward_index = {}
with open('test') as f:
    for line in f:
        items = line.split()
        term, docs = items[0], items[1:]
        inverted_index[term] = docs
        for doc in docs:
            terms = forward_index.get(doc, [])
            terms.append(term)
            forward_index[doc] = terms

print(inverted_index)
print(forward_index)

Upvotes: 1

coder
coder

Reputation: 12992

You could use defaultdict(list) from collections module - cause in your solution every time the key gets updated:

#!/usr/bin/env python 

from collections import defaultdict

inverted_index = defaultdict(list)
forward_index = defaultdict(list)
with open('term_sample.txt') as file:
    for line in file:
        items = line.split()
        term, doc = items[0], items[1:]
        for doc in items[1:]:
            inverted_index[term].append(doc)
            forward_index[doc].append(term)

print(inverted_index)
print(forward_index)

Upvotes: 1

Related Questions