user5265408
user5265408

Reputation:

Finding substrings in a certain order in python

I have a long list of strings which contain substrings of interest in the order they are given, but here is a small example using sentences in a text file:

This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!
Thank you so much in advance, I really appreciate it.

From this text file, I would like to find any sentences that contain both "I" and "need" but they must occur in that order.

So in this example, 'I' and 'need' both occur in sentence 1 and sentence 2 but in sentence 1 they are in the wrong order, so I do not want to return that. I only want to return the second sentence, as it has 'I need' in order.

I have used this example to identify the substrings, but I cannot figure out how to only find them in order:

id1 = "I"
id2 = "need"

with open('fun.txt') as f:
    for line in f:
        if id1 and id2 in line:
            print(line[:-1])

This returns:

This is a long drawn out sentence needed to emphasize a topic I am trying to learn.
It is new idea for me and I need your help with it please!

But I want only:

It is new idea for me and I need your help with it please!

Thanks!

Upvotes: 1

Views: 2561

Answers (4)

yatu
yatu

Reputation: 88226

You can define a function that computes the intersection of the two sets (each of the sentences and I need), and use sorted with a key that sorts the result in the same order of appearance that in the sentence. That way you check if the resulting list's order matches the one in I need:

a = ['I','need']
l = ['This is a long drawn out sentence needed to emphasize a topic I am trying to learn.',
'It is new idea for me and I need your help with it please!',
'Thank you so much in advance, I really appreciate it.']

Self defined function. Returns True if the strings are contained in the same order:

def same_order(l1, l2):
    inters = sorted(set(l1) & set(l2.split(' ')), key = l2.split(' ').index)
    return True if inters == l1 else False

Returns a given string in the list l if True is returned:

[l[i] for i, j in enumerate(l) if same_order(a, j)]
#['It is new idea for me and I need your help with it please!']

Upvotes: 0

melp
melp

Reputation: 64

Just do

  import re
  match = re.match('pattern','yourString' )

https://developers.google.com/edu/python/regular-expressions

So the pattern you are looking for is 'I(.*)need' Regex Match all characters between two strings You may have to construct your pattern differently as I don't know if there are exceptions. If so, you can run regex twice to get a subset of your original string, and again to get the exact match you want

Upvotes: 0

Felix
Felix

Reputation: 1905

You can use a regular expression to check for this. One possible solution is this:

id1 = "I"
id2 = "need"
regex = re.compile(r'^.*{}.*{}.*$'.format(id1, id2))

with open('fun.txt') as f:
    for line in f:
        if re.search(regex, line):
            print(line[:-1])

Upvotes: 1

Prune
Prune

Reputation: 77837

You need to identify id2 in the portion of the line after id1:

infile = [
    "This is a long drawn out sentence needed to emphasize a topic I am trying to learn.",
    "It is new idea for me and I need your help with it please!",
    "Thank you so much in advance, I really appreciate it.",
]

id1 = "I"
id2 = "need"

for line in infile:
    if id1 in line:
        pos1 = line.index(id1)
        if id2 in line[pos1+len(id1) :] :
            print(line)

Output:

It is new idea for me and I need your help with it please!

Upvotes: 1

Related Questions