Who8daPie
Who8daPie

Reputation: 45

Create a dictionary from text file

Alright well I am trying to create a dictionary from a text file so the key is a single lowercase character and each value is a list of the words from the file that start with that letter.

The text file containts one lowercase word per line eg:

airport
bathroom
boss
bottle
elephant

Output:

words = {'a': ['airport'], 'b': ['bathroom', 'boss', 'bottle'], 'e':['elephant']}

Havent got alot done really, just confused how I would get the first index from each line and set it as the key and append the values. would really appreatiate if someone can help me get sarted.

words = {}

for line in infile:
  line = line.strip() # not sure if this line is correct

Upvotes: 3

Views: 8311

Answers (2)

Niklas B.
Niklas B.

Reputation: 95308

So let's examine your example:

words = {}
for line in infile:
  line = line.strip()

This looks good for a beginning. Now you want to do something with the line. Probably you'll need the first character, which you can access through line[0]:

  first = line[0]

Then you want to check whether the letter is already in the dict. If not, you can add a new, empty list:

  if first not in words:
    words[first] = []

Then you can append the word to that list:

  words[first].append(line)

And you're done!

If the lines are already sorted like in your example file, you can also make use of itertools.groupby, which is a bit more sophisticated:

from itertools import groupby
from operator import itemgetter

with open('infile.txt', 'r') as f:
  words = { k:map(str.strip, g) for k, g in groupby(f, key=itemgetter(0)) }

You can also sort the lines first, which makes this method generally applicable:

groupby(sorted(f), ...)

Upvotes: 2

wim
wim

Reputation: 362707

defaultdict from the collections module is a good choice for these kind of tasks:

>>> import collections
>>> words = collections.defaultdict(list)
>>> with open('/tmp/spam.txt') as f:
...   lines = [l.strip() for l in f if l.strip()]
... 
>>> lines
['airport', 'bathroom', 'boss', 'bottle', 'elephant']
>>> for word in lines:
...   words[word[0]].append(word)
... 
>>> print words
defaultdict(<type 'list'>, {'a': ['airport'], 'b': ['bathroom', 'boss', 'bottle'], 'e': ['elephant']})

Upvotes: 1

Related Questions