17th Lvl Botanist
17th Lvl Botanist

Reputation: 155

Find number of occurrences of a list in a string using Python

I have a list containing several thousand short strings and a .csv file containing several hundred thousand short strings. All list elements are unique. For each string in the .csv file, I need to check to see if it contains more than one list element.

For example. I have a string:

example_string = "mermaids have braids and tails"

And a list:

example_list = ["me", "ve", "az"]

Clearly the example string contains more than one list item; me and ve. My code needs to indicate this. However, if the list was

example_list = ["ai", "az", "nr"]

only one list element is contained.

I think that the following code will check to see if each line in my .csv file contains at least one list element. However, that doesn't tell me if it contains more than one different list element.

data = file("my_file_of_strings.csv", "r").readlines()
for line in data:       
    if any(item in my_list for i in line):
        #Do something#

Upvotes: 2

Views: 1776

Answers (4)

RocketDonkey
RocketDonkey

Reputation: 37279

I think the other solutions are better for your purpose, but in case you want to keep track of the number of hits and which ones they were, you could try this:

In [14]: from collections import defaultdict

In [15]: example_list = ["me", "ve", "az"]

In [16]: example_string = "mermaids have braids and tails"

In [17]: d = defaultdict(int)

In [18]: for i in example_list:
   ....:     d[i] += example_string.count(i)
   ....:

In [19]: d
Out[19]: defaultdict(<type 'int'>, {'me': 1, 'az': 0, 've': 1})

And then to get the total number of unique matches:

In [20]: matches = sum(1 for v in d.values() if v)

In [21]: matches
Out[21]: 2

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142256

Something like:

data = file("my_file_of_strings.csv", "r").readlines()
for line in data:       
    if len(set(item for item in my_list if item in line)) > 1:
        #Do something#

Upvotes: 0

John Kugelman
John Kugelman

Reputation: 362197

def contains_multiple(string, substrings):
    count = 0

    for substring in substrings:
        if substring in string:
            count += 1
            if count > 1:
                return True

    return False

for line in data:
    if contains_multiple(line, my_list):
        ...

Not short, but it will exit early as soon as it finds the 2nd match. That may or may not be an important optimization.

Upvotes: 1

John La Rooy
John La Rooy

Reputation: 304503

with open("my_file_of_strings.csv", "r") as data:
    for line in data:       
        if any(item in i for i in line.split() for item in my_list):
            ...

If you need to count them use sum()

with open("my_file_of_strings.csv", "r") as data:
    for line in data:       
        result = sum(item in i for i in line.split() for item in my_list):

Upvotes: 2

Related Questions