TheLoneWolf91193
TheLoneWolf91193

Reputation: 416

Read a file and insert content to dictionaries

I have a text file containing information on restaurants, what is required to is,to insert this information to several dictionaries.The attributes are name, rating, price range, cuisine type

Here's the content of txt

Georgie Porgie 
87% 
$$$ 
Canadian,Pub Food

Queen St. Cafe 
82% 
$ 
Malaysian,Thai

So far I've read the file and grabbed the contents to a list.

content = [];
with open(file) as f:
        content = f.readlines();
        content = [x.strip() for x in content];

Need to insert into three dictionaries names_rating,price_names,cuisine_names how would I go about it?

Upvotes: 1

Views: 63

Answers (3)

Jon Kiparsky
Jon Kiparsky

Reputation: 7743

In general, to construct a list of dictionaries lists_of_dicts from a list of lists list_of_lists, where you're mapping the item at index i to the item at index j, you would use a dict comp like so:

list_of_dicts = {lst[i]: lst[j] for lst in list_of_lists}

You should be able to apply this to any arbitrary list_of_lists to solve your problem.

Upvotes: 2

mapofemergence
mapofemergence

Reputation: 458

Given your latest formatting spec for the text file:

Georgie Porgie 
87% 
$$$ 
Canadian,Pub Food

Queen St. Cafe 
82% 
$ 
Malaysian,Thai

if you can assume that:

  • every restaurant entry will always be defined by four lines, each containing the fields you are after (read: dictionary entries)
  • the fields will always appear in the same exact order
  • each entry will always be separated by the next one via an empty line

then you could use the modulo operation and go for something like this:

import re

content = {}
filepath = 'restaurants_new.txt'
with open(filepath, 'r') as f:
    fields = ['name', 'rating', 'price', 'cuisine']
    name = ''
    for i, line in enumerate(f):
        modulo = i % 5
        raw = line.strip()
        if modulo == 0:
            name = raw
            content[name] = {}
        elif modulo < 4:
             content[name][fields[modulo]] = raw
        elif modulo == 4:
            # we gathered all the required info; reset
            name = ''

from pprint import pformat
print pformat(content)

EDIT: the following solution was proposed after the formatting you posted originally, which looked like this:

Georgie Porgie 87% $$$ Canadian,Pub Food
Queen St. Cafe 82% $ Malaysian,Thai

I leave the original answer here, in case it is still useful for others.

As JohanL mentioned in his comment, the least trivial bit of the solution to your problem is the line formatting: depending whether you have commas or whitespaces as separators, or a combination of both, and considering that restaurants' names can contain un unkown number of words, it might become tricky to find how to split your row.

Here's a slightly different approach from the one suggested by @gaurav, using regular expressions (re module):

import re

content = {}
filepath = 'restaurants.txt'
dictmatch = r'([\s\S]+) ([0-9]{1,3}\%) (\$+) ([\s\S]+)'
with open(filepath, 'r') as f:
    for line in f:
        raw = line.strip()
        match = re.match(dictmatch, raw)
        if not match:
            print 'no match found; line skipped: "%s"' % (raw, )
            continue
        name = match.group(1)
        if name in content:
            print 'duplicate entry found; line skipped: "%s"' % (raw, )
            continue
        content[name] = {
            "rating": match.group(2),
            "price": match.group(3),
            "cuisine": match.group(4) 
        }

from pprint import pformat
print pformat(content)

The advantage of this method, assuming you have no control on the source txt, is that you can tailor the regex pattern to match whatever "unoptimal" formatting it comes with.

Upvotes: 2

gaurav
gaurav

Reputation: 136

Seeing the example of file you gave, the elements are space separated.

So, your task would be to :

  • Open the file
  • Read each line
  • Split the entries on spaces
  • Save the entries in the dictionary

This would be done as follows:

names_rating = {}
price_names = {}
cuisine_names = {}
with open(file) as f:
    lines = []
    for line in f:
        content = f.readline().rstrip()
        if content != ''
            lines.append(content)
        if len(lines) > 4 :
            name = lines[0]
            rating = lines[1]
            price = lines[2]
            cuisine = lines[3].split(',')
            names_rating[name] = rating
            price_names[name] = price
            cuisine_name[name] = cuisine
            lines = []

In this, file is read line by line and the result is appended in a list lines. When the size of list exceeds 4, all the attributes are read into the list. Then they are processed to save data in dictionaries. Then the list is emptied for doing the process again.

Upvotes: 2

Related Questions