erwanlc
erwanlc

Reputation: 307

Python Count the number of substring in list from other string list without duplicates

I have two list:

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

I want to count the number of times I find a string from master_list in a string of main_list without counting two times the same item.

Example: for the two lists above, the result of my function should be 4. 'Smith' can be retrieved 3 times in main_list. 'Roger can be found 2 times but as 'Smith' was already found in 'Roger-Smith', this one doesn't count anymore, so 'Roger' is just count as 1 which make 4 in total.

The function I wrote for know is below but I think there is a faster way to do it:

def string_detection(master_list, main_list):
    count = 0
    for substring in master_list:
        temp = list(main_list)
        for string in temp:
            if substring in string:
                main_list.remove(string)
                count+=1
    return count

Upvotes: 5

Views: 5696

Answers (6)

Paul Rooney
Paul Rooney

Reputation: 21609

A one liner

>>>sum(any(m in L for m in master_list) for L in main_list)
4

Iterate over main_list and check if any of the values from master_list are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum counts all the Trues to give you the count.

Upvotes: 9

zwer
zwer

Reputation: 25789

If your master_list is not expected to be huge, one way to do it is with regex:

import re

def string_detection(master_list, main_list):
    count = 0
    master = re.compile("|".join(master_list))
    for entry in main_list:
        if master.search(entry):
            count += 1
    return count

Upvotes: 1

Olian04
Olian04

Reputation: 6872

This would do it:

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

i = 0
for elem in main_list:
    if elem in master_list:
        i += 1
        continue
    for master_elem in master_list:
        if master_elem in elem:
            i += 1
            break

print(i) # i = 4

The code above counts 'Roger-Smith' as 1, if you want it to count as multiple, remove the break.

Upvotes: 1

Elmex80s
Elmex80s

Reputation: 3504

What about this

main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']

print len([word for word in main_list if any(mw in word for mw in master_list)])

Upvotes: 2

Yevhen Kuzmovych
Yevhen Kuzmovych

Reputation: 12140

You can do it other way around. Create list that will contain only elements from main_list that have substring from master_list

temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]

Now temp_list looks like this:

['Smith', 'Smith', 'Roger', 'Roger-Smith']

So the length of temp_list is your answer.

Upvotes: 2

Yuval Atzmon
Yuval Atzmon

Reputation: 5945

You can use pandas (which provide fast vectorized operations) with str.contains and sum()

import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()

Upvotes: 2

Related Questions