Reputation: 81
I recently tried making a regex for deleting strings which stand after each other without being interrupted by an other string, and then let only one string stay. My work so far : https://regex101.com/r/Cs0bmY/7 . It should work with all possible urls which maybe dont have www. before them or an other ending like .com or .nl etc The strings (list of urls) looks like this:
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
operator.livrareflori.md
amazon.de
fonts.gstatic.com
fonts.gstatic.com
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com
The end result should look like this:
operator.livrareflori.md
amazon.de
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com
You can see that the duplicate strings which are not interrupted by an other string are gone and only 1 result stays.
Upvotes: 1
Views: 246
Reputation: 67968
((?:https?://)?(?:www\.)?\S+\.\S+)\s(?=[\s\S]*\1)
You can try this.See demo.
https://regex101.com/r/Cs0bmY/11
Upvotes: 1
Reputation: 1949
The trick is to capture the line and use a lookahead to verify that it exists later in the subject. This expression matches duplicates, and substituting with "" makes it keep the last occurrences:
(?s)^((?:https?://)?(?:www\.)?\S+\.\S+)\n(?=.*^\1$)
https://regex101.com/r/Cs0bmY/10
Upvotes: 1
Reputation: 91415
Using Notepad++, you can do:
^(.+)$(?:\R\1)+
$1
. matches newline
Explanation:
^(.+)$ : group 1, a whole line
(?: : non capture group
\R : any kind of line break
\1 : backreference to group 1
)+ : group must appear 1 or more times
Replacement:
$1 : content of group 1
Result for given example:
operator.livrareflori.md
amazon.de
fonts.gstatic.com
erovoyeurism.net
tugtechnologyandbusiness.com
Upvotes: 1
Reputation: 370769
You can match
^(.+)$(?:\n\1)+
thus capturing the first line, and matching subsequent duplicate lines, and then replace everything matched with the first capture group:
\1
(or the equivalent keyword for the first group in whatever environment you're in)
https://regex101.com/r/Cs0bmY/8
Upvotes: 1