Reputation: 155
I have a list containing several thousand short strings and a .csv file containing several hundred thousand short strings. All list elements are unique. For each string in the .csv file, I need to check to see if it contains more than one list element.
For example. I have a string:
example_string = "mermaids have braids and tails"
And a list:
example_list = ["me", "ve", "az"]
Clearly the example string contains more than one list item; me and ve. My code needs to indicate this. However, if the list was
example_list = ["ai", "az", "nr"]
only one list element is contained.
I think that the following code will check to see if each line in my .csv file contains at least one list element. However, that doesn't tell me if it contains more than one different list element.
data = file("my_file_of_strings.csv", "r").readlines()
for line in data:
if any(item in my_list for i in line):
#Do something#
Upvotes: 2
Views: 1776
Reputation: 37279
I think the other solutions are better for your purpose, but in case you want to keep track of the number of hits and which ones they were, you could try this:
In [14]: from collections import defaultdict
In [15]: example_list = ["me", "ve", "az"]
In [16]: example_string = "mermaids have braids and tails"
In [17]: d = defaultdict(int)
In [18]: for i in example_list:
....: d[i] += example_string.count(i)
....:
In [19]: d
Out[19]: defaultdict(<type 'int'>, {'me': 1, 'az': 0, 've': 1})
And then to get the total number of unique matches:
In [20]: matches = sum(1 for v in d.values() if v)
In [21]: matches
Out[21]: 2
Upvotes: 0
Reputation: 142256
Something like:
data = file("my_file_of_strings.csv", "r").readlines()
for line in data:
if len(set(item for item in my_list if item in line)) > 1:
#Do something#
Upvotes: 0
Reputation: 362197
def contains_multiple(string, substrings):
count = 0
for substring in substrings:
if substring in string:
count += 1
if count > 1:
return True
return False
for line in data:
if contains_multiple(line, my_list):
...
Not short, but it will exit early as soon as it finds the 2nd match. That may or may not be an important optimization.
Upvotes: 1
Reputation: 304503
with open("my_file_of_strings.csv", "r") as data:
for line in data:
if any(item in i for i in line.split() for item in my_list):
...
If you need to count them use sum()
with open("my_file_of_strings.csv", "r") as data:
for line in data:
result = sum(item in i for i in line.split() for item in my_list):
Upvotes: 2