Reputation: 300
I have a small (<100) list of chemical names called detected_chems
.
And a second much larger (>1000) iterable; a dictionary chem_db
containing chemical names as the key, and a dictionary of chemical properties as the value. Like this:
{'chemicalx':{'property1':'smells','property2':'poisonous'},
'chemicaly':{'property1':'stinks','property2':'toxic'}}
I am trying to match all the detected chemicals with those in the database and pull their properties.
I have studied these questions/answers but can't seem to apply it to my case (sorry)
So I am making a list of results res
, but instead of nested for loops with an if x in
condition, I've created this.
res = [{chem:chem_db[chem]}
for det_chem in detected_chems
for chem in chem_db.keys()
if det_chem in chem]
This works to an extent!
What I (think) am doing here is creating a list of dictionaries, which will have the key:value pair of chemical names (keys) and information about the chemicals (as a dictionary itself, as values), if the detected chemical is found somewhere in the chemical database (chem_db).
The problem is not all the detected chemicals are found in the database. This is probably because of misspelling or name variation (e.g. they include numbers) or something similar.
So to solve the problem I need to identify which detected chemicals are not being matched. I thought this might be a solution:
not_matched=[]
res = [{chem:chem_db[chem]}
for det_chem in detected_chems
for chem in chem_db.keys()
if det_chem in chem else not_matched.append(det_chem)]
I am getting a syntax error, due to the else not_matched.append(det_chem)
part.
I have two questions:
1) Where should I put the else condition to avoid the syntax error?
2) Can the not_matched
list be built within the list comprehension, so I don't create that empty list first.
res = [{chem:chem_db[chem]}
for det_chem in detected_chems
for chem in chem_db.keys()
if det_chem in chem else print(det_chem)]
What I'd like to achieve is something like:
in: len(detected_chems)
out: 20
in: len(res)
out: 18
in: len(not_matched)
out: 2
in: print(not_matched)
out: ['chemical_strange_character$$','chemical___WeirdSPELLING']
That will help me find trouble shoot the matching.
Upvotes: 3
Views: 624
Reputation: 34086
You should
if det_chem in chem or not_matched.append(det_chem)
but that being said if you clean up a bit as per comments I think there is a much more efficient way of doing what you want. The explanation of the above is that append
returns None
so the whole if-condition will evaluate to False
(but the item still appended to the not_matched
list)
Re: efficiency:
res = [{det_chem:chem_db[det_chem]}
for det_chem in detected_chems
if det_chem in chem_db or not_matched.append(det_chem)]
This should be drastically faster - the for loop on dictionary keys is an O(n) operation while dictionaries are used precisely because lookup is O(1) so instead of retrieving the keys and comparing them one by one we use the det_chem in chem_db
lookup which is hash based
Bonus: dict comprehension (to address question 2)
I am not sure why a list of one-key-dicts is built but probably what needed is a dict comprehension as in:
chem_db = {1: 2, 4: 5}
detected_chems = [1, 3]
not_matched = []
res = {det_chem: chem_db[det_chem] for det_chem in detected_chems if
det_chem in chem_db or not_matched.append(det_chem)}
# output
print(res) # {1: 2}
print(not_matched) # [3]
No way I can think of to build the not_matched
list while also building res
using a single list/dict comprehension.
Upvotes: 4
Reputation: 765
The below sample code would give you the desired output what you want. It is using a dictionary comprehension instead of a list comprehension to capture matched dictionary item info as a dictionary only. It is because you would need a dictionary of matched chemical items instead of a list. In the dictionary of matched items, it would be easier for you to get their properties later. Also, you don't need to use chem_db.keys() because the "in" operator itself searches the target in the entire sequence (be it a list or dictionary). And if the seq is a dict, then it matches the target with all the keys inside the dictionary.
Code:
detected_chems=['chemical_strange_character$$','chemical___WeirdSPELLING','chem3','chem4','chem5','chem6','chem7','chem8','chem9','chem10','chem11','chem12','chem13','chem14','chem15','chem16','chem17','chem18','chem19','chem20']
chem_db = {'chem1':{'property1':'smells','property2':'poisonous'},'chem2':{'property1':'stinks','property2':'toxic'},'chem3':{'property1':'smells','property2':'poisonous'},'chem4':{'property1':'smells','property2':'poisonous'},'chem5':{'property1':'smells','property2':'poisonous'},'chem6':{'property1':'smells','property2':'poisonous'},'chem7':{'property1':'smells','property2':'poisonous'},'chem8':{'property1':'smells','property2':'poisonous'},'chem9':{'property1':'smells','property2':'poisonous'},'chem10':{'property1':'smells','property2':'poisonous'},'chem11':{'property1':'smells','property2':'poisonous'},'chem12':{'property1':'smells','property2':'poisonous'},'chem13':{'property1':'smells','property2':'poisonous'},'chem14':{'property1':'smells','property2':'poisonous'},'chem15':{'property1':'smells','property2':'poisonous'},'chem16':{'property1':'smells','property2':'poisonous'},'chem17':{'property1':'smells','property2':'poisonous'},'chem18':{'property1':'smells','property2':'poisonous'},'chem19':{'property1':'smells','property2':'poisonous'},'chem20':{'property1':'smells','property2':'poisonous'}}
not_matched = []
res = {det_chem:chem_db[det_chem]
for det_chem in detected_chems
if det_chem in chem_db or not_matched.append(det_chem)}
print(len(detected_chems))
print(len(res))
print(len(not_matched))
print(not_matched)
The output:
20
18
2
['chemical_strange_character$$', 'chemical___WeirdSPELLING']
If any info is needed further, see to check out: dictionary in python
Upvotes: 0
Reputation: 6219
Your Syntax error comes from the fact that comprehension do not accepts else
clauses.
You could eventually use the ... if ... else ...
ternary operator to determine the value to put in your comprehension result. Something like below:
not_matched=[]
res = [{chem:chem_db[chem]} if det_chem in chem else not_matched.append(det_chem)
for det_chem in detected_chems
for chem in chem_db.keys()]
But it would be a bad idea since you then would have None
in your res
for each not matched. This is because the ... if ... else ...
operator always returns a value, and in your case, the value would be the return value of the list.append
method (= None).
You could then filter the res
list to remove None
values but... meh...
A better solution would be to simply keep your first comprehsion and get the difference between the original chem
list and the res
list:
not_matched = set(chems).difference(<the already matched chems>)
Note that I used a the already matched chems placeholder instead of a real chunk of code because the way your store your res
is not pratical at all. Indeed it is a list of single-key-dictionaries which is a non-sens. The role of dictionary is to hold multiple values identified by keys.
A solution to this would be to make res
a dictionary instead of a list, using a dict comprehension:
res = {chem: chem_db[chem]
for det_chem in detected_chems
for chem in chem_db.keys()
if det_chem in chem}
Doing this, the the already matched chems placeholder could be replaced by res.values()
As an addition, even if comprehensions are a really cool feature in a lot of cases, they are not a miraculous feature which should be used everywhere. And nested comprehensions are a real pain to read and should be avoided (in my opinion at least).
Upvotes: 1
Reputation: 14201
List comprehension consists formally up to 3 parts. Let's show them in an example:
[2 * i for i in range(10) if i % 3 == 0]
The first part is an expression — and it may be (or used in it) the ternary operator (x if y else z
)
The second part is a list (or more lists in nested for
loops) to select values for a variable from it.
The third part (optional) is a filter (for selecting in the part 2) - and the else
clause in not allowed here!
So if you want to use the else
branch, you have to put it into the first part, for example
[2 * i if i < 5 else 3 * i for i in range(10) if i % 3 == 0]
Upvotes: 1