Reputation: 307
I have two list:
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
I want to count the number of times I find a string from master_list in a string of main_list without counting two times the same item.
Example: for the two lists above, the result of my function should be 4. 'Smith' can be retrieved 3 times in main_list. 'Roger can be found 2 times but as 'Smith' was already found in 'Roger-Smith', this one doesn't count anymore, so 'Roger' is just count as 1 which make 4 in total.
The function I wrote for know is below but I think there is a faster way to do it:
def string_detection(master_list, main_list):
count = 0
for substring in master_list:
temp = list(main_list)
for string in temp:
if substring in string:
main_list.remove(string)
count+=1
return count
Upvotes: 5
Views: 5696
Reputation: 21609
A one liner
>>>sum(any(m in L for m in master_list) for L in main_list)
4
Iterate over main_list
and check if any
of the values from master_list
are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum
counts all the True
s to give you the count.
Upvotes: 9
Reputation: 25789
If your master_list is not expected to be huge, one way to do it is with regex:
import re
def string_detection(master_list, main_list):
count = 0
master = re.compile("|".join(master_list))
for entry in main_list:
if master.search(entry):
count += 1
return count
Upvotes: 1
Reputation: 6872
This would do it:
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
i = 0
for elem in main_list:
if elem in master_list:
i += 1
continue
for master_elem in master_list:
if master_elem in elem:
i += 1
break
print(i) # i = 4
The code above counts 'Roger-Smith'
as 1, if you want it to count as multiple, remove the break
.
Upvotes: 1
Reputation: 3504
What about this
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
print len([word for word in main_list if any(mw in word for mw in master_list)])
Upvotes: 2
Reputation: 12140
You can do it other way around. Create list that will contain only elements from main_list
that have substring from master_list
temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]
Now temp_list
looks like this:
['Smith', 'Smith', 'Roger', 'Roger-Smith']
So the length of temp_list
is your answer.
Upvotes: 2
Reputation: 5945
You can use pandas
(which provide fast vectorized operations) with str.contains
and sum()
import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()
Upvotes: 2