Reputation: 449
I am looking for the most clear, Pythonic, and fastest way to check if a string contains words from a list of lists
This is what I came up so far
introStrings = ['introduction:' , 'case:' , 'introduction' , 'case' ]
backgroundStrins = ['literature:' , 'background:', 'Related:' , 'literature' , 'background', 'related' ]
methodStrings = [ 'methods:' , 'method:', 'techniques:', 'methodology:' , 'methods' , 'method', 'techniques', 'methodology' ]
resultStrings = [ 'results:', 'result:', 'experimental:', 'experiments:', 'experiment:', 'results', 'result', 'experimental', 'experiments', 'experiment']
discussioStrings = [ 'discussion:' , 'Limitations:' , 'discussion' , 'limitations']
conclusionStrings = ['conclusion:' , 'conclusions:', 'concluding:' , 'conclusion' , 'conclusions', 'concluding' ]
allStrings = [ introStrings, backgroundStrins, methodStrings, resultStrings, discussioStrings, conclusionStrings ]
testtt = 'this may thod be in techniques ever material and methods'
for item in allStrings:
for word in testtt.split():
if word in item:
print('yes')
break
This code pretty looks for all combinations. It's a nested for loop. It it's not quite clear to figure out on first glance.
I am wondering if there is a better way.
Upvotes: 0
Views: 88
Reputation: 1265
What I can get is by use of chain
and any
:
resultStrings = [
"results:",
"result:",
"experimental:",
"experiments:",
"experiment:",
"results",
"result",
"experimental",
"experiments",
"experiment",
]
conclusionStrings = [
"conclusion:",
"conclusions:",
"concluding:",
"conclusion",
"conclusions",
"concluding",
]
allStrings = [resultStrings, conclusionStrings]
testtt = "this may thod be in techniques ever material and methods"
from itertools import chain
string_set = set(chain(*allStrings))
any(i in string_set for i in testtt.split())
Though set
need some space, it can improve efficiency. Thanks Peter Wood.
Upvotes: 2
Reputation: 9145
I am looking for the most clear, Pythonic, and fastest way to check if a string contains words from a list of lists
First, I'd flatten the lists
all_strings = [*intro, *back, *methods, ...] # You get the idea
(Alternatively, using a nested list comprehension)
all_strings = [word for list in [intro, back, ...] for word in list] # if you're into that
Next, split the string:
string_words = a_string.split()
Finally, just look up words:
found = [w for w in string_words if w in all_strings]
That's quite pythonic, not very sure about speed or reliability
Upvotes: 2
Reputation: 1052
Using itertools
import itertools
merged = list(itertools.chain.from_iterable(allStrings))
[print(x) for x in testtt.split() if x in merged]
Upvotes: 1
Reputation: 33335
It would be more Pythonic to use any()
with a chained list comprehension:
print any(word in sublist for word in testtt.split() for sublist in allStrings)
However this will just return true/false; it won't identify which word was found in which sublist. You can print the specific matches with this list comprehension:
print [(word,sublist) for word in testtt.split() for sublist in allStrings if word in sublist]
Your code is a bit wasteful by calculating testtt.split()
more than once.
Upvotes: 3