Reputation: 543
I have following two python lists.
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
min_len_sec_list = 3
max_len_sec_list = 5
I want to create a dictionary with elements from first list as keys and with following constraints :
False
.False
.Eg:
(i) while checking 123
, if 1234
, 12345
exists (123*
) in second list then value of 123
will be False
.
(ii). Similarly while checking 1234
, if 12345
exists (1234*
) then value will be False
.
Here *
will be [0-9]{(max_len-len_token)}
True
.OUTPUT :
final_token_dict
{'119': False,'120': True, '123': False, '1234': False, '12345': True}
Can I get any suggestions on how to achieve this? Thanks in advance!!!
Upvotes: 6
Views: 175
Reputation: 71451
You can use any
:
a = ['119', '120', '123', '1234', '12345']
b = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
new_d = {c:c in b and not any(i.startswith(c) and len(c) < len(i) for i in b) for c in a}
Output:
{'120': True, '1234': False, '119': False, '123': False, '12345': True}
Upvotes: 3
Reputation: 12005
Just a normal dict comprehension with the value being True
if sum of number of elements in complete_tokens
that begins with the specified key is 1, would do the job
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
res = {elem:sum(v.startswith(elem) for v in complete_tokens)==1 for elem in prob_tokens}
print (res)
Output
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
For better efficiency, you can convert complete_tokens
as a set and then use any
instead of checking each and every element
complete_tokens_set = set(complete_tokens)
res = {elem:elem in complete_tokens_set and not any(v!=elem and v.startswith(elem) for v in complete_tokens_set) for elem in prob_tokens}
Upvotes: 2
Reputation: 82899
You can convert your list into a Trie, or Prefix Tree, structure, then check whether any of the keys is a prefix in that Trie. This will be faster than checking whether its a prefix of each element in the list individually. More specifically, if you have k
elements in your prob_tokens
list, and n
elements in complete_tokens
, then this will make only O(n+k), whereas checking each pair is O(n*k).1
def make_trie(lst):
trie = {}
for key in lst:
t = trie
for c in key:
t = t.setdefault(c, {})
return trie
def check_trie(trie, key):
for c in key:
trie = trie.get(c, None)
if trie is None: return False # not in trie
if trie == {}: return True # leaf in trie
return False # in trie, but not a leaf
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
trie = make_trie(complete_tokens)
# {'1': {'1': {'2': {}}, '2': {'0': {}, '1': {}, '3': {'3': {}, '4': {'5': {}}, '5': {}}}}}
res = {key: check_trie(trie, key) for key in prob_tokens}
# {'119': False, '120': True, '123': False, '1234': False, '12345': True}
1) Actually, the average length of the keys also is a factor, but it is so in both approaches.
Upvotes: 4
Reputation: 26315
I guess you could also try something like this:
from collections import Counter
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
result = {}
for token in prob_tokens:
token_len = len(token)
# Create counts of prefix lengths
counts = Counter(c[:token_len] for c in complete_tokens)
# Set True if only one prefix found, else False
result[token] = counts[token] == 1
print(result)
Which Outputs:
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
Upvotes: 2
Reputation: 11333
This might be another alternative
import re
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
dictionary = dict()
for tok in prob_tokens:
if tok not in complete_tokens or any([bool(re.compile(r'^%s\d+'%tok).search(tok2)) for tok2 in complete_tokens]):
dictionary[tok] = False
else:
dictionary[tok] = True
print(dictionary)`
Upvotes: 3
Reputation: 164623
You can use a custom function with a dictionary comprehension:
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
def mapper(val, ref_list):
if any(x.startswith(val) and (len(x) > len(val)) for x in ref_list):
return False
if val in ref_list:
return True
return False
res = {i: mapper(i, complete_tokens) for i in prob_tokens}
print(res)
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
If the number of characters criterion is important to you, you can adjust your logic accordingly using chained comparisons and an additional input:
def mapper(val, ref_list, max_len):
if any(x.startswith(val) and (0 < (len(x) - len(val)) <= max_len) for x in ref_list):
return False
if val in ref_list:
return True
return False
min_len_sec_list = 3
max_len_sec_list = 5
add_lens = max_len_sec_list - min_len_sec_list
res = {i: mapper(i, complete_tokens, add_lens) for i in prob_tokens}
Upvotes: 7