Reputation:

How can I match the start and end in Python's regex?

I have a string and I want to match something at the start and end with a single search pattern. How can this be done?

Let's say we have a string like:

 string = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"

I want to do something like this:

 re.search("^ftp:// & .jpg$" ,string)

Obviously, it's incorrect, but I hope it gets my point across. Is this possible?

Upvotes: 43

Answers (7)

BadgerTaco

Reputation: 21

I had a similar issue and here's what I came up with.

If you are looking for a substring within a string, you can use the string.find() method to see where in the string your substring starts, and where it ends.

You should, in theory, use the same variable name here for all the variables named x_text in my code, and the same variable for those labeled substring_start or substring_end.
This would be the more memory-efficient method, but I have named them differently in hopes of making this as clear as possible.

Let x = a string that represents the start of the substring you're searching for, and let y = the same for the end of that substring.

full_text=yourstring

substring_start=full_text.find(x)  
# This will return the index of where your starting indicator first appears in your full string

backend_text=full_text[substring_start:]
# This truncates your string to start only where you indicated

substring_end=backend_text.find(y)
# This will find the index (relative to this backend_string) where your string should end

final_text=backend_text[0:substring_end]

Here's a working example, let's say your string is this whole mess

<article class="product_pod">
<div class="image_container">
<a href="a-light-in-the-attic_1000/index.html"><img alt="A Light in the Attic" class="thumbnail" src="../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg"/></a>
</div>
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
<div class="product_price">
<p class="price_color">Â£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
    
        In stock
    
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>
1

The following code

title_start=full_text.find("title")
backend_text=full_text[title_start:]
title_end=backend_text.find('">')
final_text=backend_text[0:title_end]

would return:

'title="A Light in the Attic'

Upvotes: 0

JKirchartz

Reputation: 18022

Don't be greedy, use ^ftp://(.*?)\.jpg$

Upvotes: 19

Colin Wang

Reputation: 851

I want extract all numeric, include int and float.

and it works for me.

import re

s = '[11-09 22:55:41] [INFO ]  [  4560] source_loss: 0.717, target_loss: 1.279, 
transfer_loss:  0.001, total_loss:  0.718'

print([float(s) if '.' in s else int(s) for s in re.findall(r'-?\d+\.?\d*', s)])

refs: https://www.tutorialspoint.com/How-to-extract-numbers-from-a-string-in-Python

Upvotes: 0

Niklas B.

Reputation: 95298

re.match will match the string at the beginning, in contrast to re.search:

re.match(r'(ftp|http)://.*\.(jpg|png)$', s)

Two things to note here:

r'' is used for the string literal to make it trivial to have backslashes inside the regex
string is a standard module, so I chose s as a variable
If you use a regex more than once, you can use r = re.compile(...) to built the state machine once and then use r.match(s) afterwards to match the strings

If you want, you can also use the urlparse module to parse the URL for you (though you still need to extract the extension):

>>> allowed_schemes = ('http', 'ftp')
>>> allowed_exts = ('png', 'jpg')
>>> from urlparse import urlparse
>>> url = urlparse("ftp://www.somewhere.com/over/the/rainbow/image.jpg")
>>> url.scheme in allowed_schemes
True
>>> url.path.rsplit('.', 1)[1] in allowed_exts
True

Upvotes: 46

Sven Marnach

Reputation: 601489

How about not using a regular expression at all?

if string.startswith("ftp://") and string.endswith(".jpg"):

Don't you think this reads nicer?

You can also support multiple options for start and end:

if (string.startswith(("ftp://", "http://")) and 
    string.endswith((".jpg", ".png"))):

Upvotes: 50

Roman Bataev

Reputation: 9361

import re

s = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"
print(re.search("^ftp://.*\.jpg$", s).group(0))

Upvotes: 5

Howard

Reputation: 39187

Try

 re.search(r'^ftp://.*\.jpg$' ,string)

if you want a regular expression search. Note that you have to escape the period because it has a special meaning in regular expressions.

Upvotes: 12

How can I match the start and end in Python&#39;s regex?

Answers (7)

Related Questions

How can I match the start and end in Python's regex?