Reputation: 416
I have a text file containing information on restaurants, what is required to is,to insert this information to several dictionaries.The attributes are name, rating, price range, cuisine type
Here's the content of txt
Georgie Porgie
87%
$$$
Canadian,Pub Food
Queen St. Cafe
82%
$
Malaysian,Thai
So far I've read the file and grabbed the contents to a list.
content = [];
with open(file) as f:
content = f.readlines();
content = [x.strip() for x in content];
Need to insert into three dictionaries names_rating,price_names,cuisine_names how would I go about it?
Upvotes: 1
Views: 63
Reputation: 7743
In general, to construct a list of dictionaries lists_of_dicts
from a list of lists list_of_lists
, where you're mapping the item at index i
to the item at index j
, you would use a dict comp like so:
list_of_dicts = {lst[i]: lst[j] for lst in list_of_lists}
You should be able to apply this to any arbitrary list_of_lists
to solve your problem.
Upvotes: 2
Reputation: 458
Given your latest formatting spec for the text file:
Georgie Porgie
87%
$$$
Canadian,Pub Food
Queen St. Cafe
82%
$
Malaysian,Thai
if you can assume that:
then you could use the modulo operation and go for something like this:
import re
content = {}
filepath = 'restaurants_new.txt'
with open(filepath, 'r') as f:
fields = ['name', 'rating', 'price', 'cuisine']
name = ''
for i, line in enumerate(f):
modulo = i % 5
raw = line.strip()
if modulo == 0:
name = raw
content[name] = {}
elif modulo < 4:
content[name][fields[modulo]] = raw
elif modulo == 4:
# we gathered all the required info; reset
name = ''
from pprint import pformat
print pformat(content)
EDIT: the following solution was proposed after the formatting you posted originally, which looked like this:
Georgie Porgie 87% $$$ Canadian,Pub Food
Queen St. Cafe 82% $ Malaysian,Thai
I leave the original answer here, in case it is still useful for others.
As JohanL mentioned in his comment, the least trivial bit of the solution to your problem is the line formatting: depending whether you have commas or whitespaces as separators, or a combination of both, and considering that restaurants' names can contain un unkown number of words, it might become tricky to find how to split your row.
Here's a slightly different approach from the one suggested by @gaurav, using regular expressions (re
module):
import re
content = {}
filepath = 'restaurants.txt'
dictmatch = r'([\s\S]+) ([0-9]{1,3}\%) (\$+) ([\s\S]+)'
with open(filepath, 'r') as f:
for line in f:
raw = line.strip()
match = re.match(dictmatch, raw)
if not match:
print 'no match found; line skipped: "%s"' % (raw, )
continue
name = match.group(1)
if name in content:
print 'duplicate entry found; line skipped: "%s"' % (raw, )
continue
content[name] = {
"rating": match.group(2),
"price": match.group(3),
"cuisine": match.group(4)
}
from pprint import pformat
print pformat(content)
The advantage of this method, assuming you have no control on the source txt, is that you can tailor the regex pattern to match whatever "unoptimal" formatting it comes with.
Upvotes: 2
Reputation: 136
Seeing the example of file you gave, the elements are space separated.
So, your task would be to :
This would be done as follows:
names_rating = {}
price_names = {}
cuisine_names = {}
with open(file) as f:
lines = []
for line in f:
content = f.readline().rstrip()
if content != ''
lines.append(content)
if len(lines) > 4 :
name = lines[0]
rating = lines[1]
price = lines[2]
cuisine = lines[3].split(',')
names_rating[name] = rating
price_names[name] = price
cuisine_name[name] = cuisine
lines = []
In this, file is read line by line and the result is appended in a list lines
. When the size of list exceeds 4, all the attributes are read into the list. Then they are processed to save data in dictionaries. Then the list is emptied for doing the process again.
Upvotes: 2