Reputation:
I have a long list of strings which contain substrings of interest in the order they are given, but here is a small example using sentences in a text file:
This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
Thank you so much in advance, I really appreciate it.
From this text file, I would like to find any sentences that contain both "I"
and "need"
but they must occur in that order.
So in this example, 'I'
and 'need'
both occur in sentence 1 and sentence 2 but in sentence 1 they are in the wrong order, so I do not want to return that. I only want to return the second sentence, as it has 'I need'
in order.
I have used this example to identify the substrings, but I cannot figure out how to only find them in order:
id1 = "I"
id2 = "need"
with open('fun.txt') as f:
for line in f:
if id1 and id2 in line:
print(line[:-1])
This returns:
This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
But I want only:
It is new idea for me and I need your help with it please!
Thanks!
Upvotes: 1
Views: 2561
Reputation: 88226
You can define a function that computes the intersection of the two sets
(each of the sentences and I need
), and use sorted
with a key
that sorts the result in the same order of appearance that in the sentence. That way you check if the resulting list's order matches the one in I need
:
a = ['I','need']
l = ['This is a long drawn out sentence needed to emphasize a topic I am trying to learn.',
'It is new idea for me and I need your help with it please!',
'Thank you so much in advance, I really appreciate it.']
Self defined function. Returns True
if the strings are contained in the same order:
def same_order(l1, l2):
inters = sorted(set(l1) & set(l2.split(' ')), key = l2.split(' ').index)
return True if inters == l1 else False
Returns a given string in the list l
if True
is returned:
[l[i] for i, j in enumerate(l) if same_order(a, j)]
#['It is new idea for me and I need your help with it please!']
Upvotes: 0
Reputation: 64
Just do
import re
match = re.match('pattern','yourString' )
https://developers.google.com/edu/python/regular-expressions
So the pattern you are looking for is 'I(.*)need' Regex Match all characters between two strings You may have to construct your pattern differently as I don't know if there are exceptions. If so, you can run regex twice to get a subset of your original string, and again to get the exact match you want
Upvotes: 0
Reputation: 1905
You can use a regular expression to check for this. One possible solution is this:
id1 = "I"
id2 = "need"
regex = re.compile(r'^.*{}.*{}.*$'.format(id1, id2))
with open('fun.txt') as f:
for line in f:
if re.search(regex, line):
print(line[:-1])
Upvotes: 1
Reputation: 77837
You need to identify id2
in the portion of the line after id1
:
infile = [
"This is a long drawn out sentence needed to emphasize a topic I am trying to learn.",
"It is new idea for me and I need your help with it please!",
"Thank you so much in advance, I really appreciate it.",
]
id1 = "I"
id2 = "need"
for line in infile:
if id1 in line:
pos1 = line.index(id1)
if id2 in line[pos1+len(id1) :] :
print(line)
Output:
It is new idea for me and I need your help with it please!
Upvotes: 1