Reputation:
I have a string and I want to match something at the start and end with a single search pattern. How can this be done?
Let's say we have a string like:
string = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"
I want to do something like this:
re.search("^ftp:// & .jpg$" ,string)
Obviously, it's incorrect, but I hope it gets my point across. Is this possible?
Upvotes: 43
Views: 160810
Reputation: 21
I had a similar issue and here's what I came up with.
If you are looking for a substring within a string, you can use the string.find() method to see where in the string your substring starts, and where it ends.
You should, in theory, use the same variable name here for all the variables named x_text in my code, and the same variable for those labeled substring_start or substring_end.
This would be the more memory-efficient method, but I have named them differently in hopes of making this as clear as possible.
Let x = a string that represents the start of the substring you're searching for, and let y = the same for the end of that substring.
full_text=yourstring
substring_start=full_text.find(x)
# This will return the index of where your starting indicator first appears in your full string
backend_text=full_text[substring_start:]
# This truncates your string to start only where you indicated
substring_end=backend_text.find(y)
# This will find the index (relative to this backend_string) where your string should end
final_text=backend_text[0:substring_end]
Here's a working example, let's say your string is this whole mess
<article class="product_pod">
<div class="image_container">
<a href="a-light-in-the-attic_1000/index.html"><img alt="A Light in the Attic" class="thumbnail" src="../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg"/></a>
</div>
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
<div class="product_price">
<p class="price_color">£51.77</p>
<p class="instock availability">
<i class="icon-ok"></i>
In stock
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>
1
The following code
title_start=full_text.find("title")
backend_text=full_text[title_start:]
title_end=backend_text.find('">')
final_text=backend_text[0:title_end]
would return:
'title="A Light in the Attic'
Upvotes: 0
Reputation: 851
I want extract all numeric, include int and float.
and it works for me.
import re
s = '[11-09 22:55:41] [INFO ] [ 4560] source_loss: 0.717, target_loss: 1.279,
transfer_loss: 0.001, total_loss: 0.718'
print([float(s) if '.' in s else int(s) for s in re.findall(r'-?\d+\.?\d*', s)])
refs: https://www.tutorialspoint.com/How-to-extract-numbers-from-a-string-in-Python
Upvotes: 0
Reputation: 95298
re.match
will match the string at the beginning, in contrast to re.search
:
re.match(r'(ftp|http)://.*\.(jpg|png)$', s)
Two things to note here:
r''
is used for the string literal to make it trivial to have backslashes inside the regexstring
is a standard module, so I chose s
as a variabler = re.compile(...)
to built the state machine once and then use r.match(s)
afterwards to match the stringsIf you want, you can also use the urlparse
module to parse the URL for you (though you still need to extract the extension):
>>> allowed_schemes = ('http', 'ftp')
>>> allowed_exts = ('png', 'jpg')
>>> from urlparse import urlparse
>>> url = urlparse("ftp://www.somewhere.com/over/the/rainbow/image.jpg")
>>> url.scheme in allowed_schemes
True
>>> url.path.rsplit('.', 1)[1] in allowed_exts
True
Upvotes: 46
Reputation: 601489
How about not using a regular expression at all?
if string.startswith("ftp://") and string.endswith(".jpg"):
Don't you think this reads nicer?
You can also support multiple options for start and end:
if (string.startswith(("ftp://", "http://")) and
string.endswith((".jpg", ".png"))):
Upvotes: 50
Reputation: 9361
import re
s = "ftp://www.somewhere.com/over/the/rainbow/image.jpg"
print(re.search("^ftp://.*\.jpg$", s).group(0))
Upvotes: 5
Reputation: 39187
Try
re.search(r'^ftp://.*\.jpg$' ,string)
if you want a regular expression search. Note that you have to escape the period because it has a special meaning in regular expressions.
Upvotes: 12