Reputation: 33
I am trying to extract multiple domain names that end in .com either starting with https or http from a string.
The string is:
string="jssbhshhahttps://www.one.comsbshhshshttp://www.another.comhehsbwkwkwjhttp://www.again.co.uksbsbs"
I have created the pattern as follows:
pattern=re.compile("https?://")
I am not sure how to finish it off.
I would like to return a list of all domains that start with http
or Https
and end in .com
only. So no .co.uk
domains in the output.
I have tried using (.*)
in the middle to represent unlimited combinations of characters but now sure how to finish it off.
Any help would be much appreciated and it would be great if all parts of the expression could be explained.
Upvotes: 2
Views: 112
Reputation: 627341
You can use
https?://(?:(?!https?://)\S)*?\.com
See the regex demo. You may use a case insensitive modifier re.I
or add (?i)
inline flag to make the regex case insensitive.
Details
https?://
- http://
or https://
(?:(?!https?://)\S)*?
- any non-whitespace char, zero or more but as few as possible occurrences, not starting a http://
or https://
char sequence (this regex construct is known under a "tempered greedy token" name)\.com
- a .com
string.Upvotes: 1