How to choose first match from Alternation regex?

Question

I am trying to extract all the text from the tweets before the URL starting with "https:...".

Example Tweet:

"This traditional hairdo is back in fashion thanks to the coronavirus, and Kenyans are using it to raise awareness https://... (Video via @QuickTake)"

In this example I would like to remove the "https://... (Video via @QuickTake)" and get the text from the beginning. But it should also work for when the tweet comes without any URL link in the tweet text.

I have tried this expression and gets two matches for when it comes with URL:

/(.*)(?=\shttps.*)|(.*)

How can I make it to retrieve only the text from the tweets.

Thanks in advance!

Wiktor Stribiżew · Accepted Answer

You may remove the https and all tha follows till the end of string, use

tweet = re.sub(r'\s*https.*', '', tweet)

Details:

\s* - 0+ whitespaces
https - a string
.* - the rest of the string (line).

How to choose first match from Alternation regex?

Answers (2)

Related Questions