Reputation: 1081
I have the following string:
/youtube.com/videos/cats
/google.com/images/dogs
I'm trying to find a regex formula that will allow me to capture the text up to the second slash (ignoring the rest of the string)
SO it would look like this
/youtube.com/
/google.com/
For reference I am using Python 3.7
I have tried positive lookbehinds and the closest I got was this:
[^/]/
Any help appreciated
Upvotes: 0
Views: 2505
Reputation: 6148
Input
/youtube.com/videos/cats
/google.com/images/dogs
RegEx
/.*?/ : [1]
/ : First slash
.*? : Non-greedy match anything
/ : Second slash
/* Output:
FULL MATCH
1> /youtube.com/
2> /google.com/
*/
/(.*?)/ : [2] : Same as [1] BUT captures the text between the slashes as a group {symbol: (...)}
^/(.*?)/ : [3] : Same as [2] BUT specifies match must be at start of string {symbol: ^}
/* Output:
FULL MATCH CAPTURE GROUP
1> /youtube.com/ youtube.com
2> /google.com/ google.com
*/
(/(.*?)(?=$|/)) : [4] : Captures text between all slashes
** FLAGS: MULTILINE
** Unless passed in individually (i.e. one expression per URL)
/* Output:
Full MATCH CAPTURE GROUP
1> /youtube.com youtube.com
2> /videos videos
3> /cats cats
4> /google.com/ google.com
5> /images images
6>
*/
/(.*?)(?=$|/) : [5] : Same as [4] BUT doesn't capture leading slashes
** FLAGS: MULTILINE
** Unless passed in individually (i.e. one expression per URL)
/* Output:
FULL MATCH
1> youtube.com
2> videos
3> cats
4> google.com
5> images
6> dogs
*/
Example 1
Match text between first two slashes. Singular input.
import re
regex = r"/(.*?)/"
test_str = "/youtube.com/videos/cats"
matches = re.findall(regex, test_str)
// RESULT: matches == ['youtube.com']
Example 2
Match text between first two slashes. Multiline input.
import re
regex = r"/(.*?)/"
test_str = """
/youtube.com/videos/cats
/google.com/images/dogs
"""
matches = re.findall(regex, test_str)
//RESULT : matches == ['youtube.com', 'google.com']
Example 3
Match text between all slashes. Singular input.
import re
regex = r"/(.*?)(?=$|/)"
test_str = "/youtube.com/videos/cats"
matches = re.findall(regex, test_str)
// RESULT: matches == ['youtube.com', 'videos', 'cats']
Example 4
Match text between all slashes. Multiline input.
import re
regex = r"/(.*?)(?=$|/)"
test_str = """
/youtube.com/videos/cats
/google.com/videos/cats
"""
matches = re.findall(regex, test_str, re.MULTILINE)
//RESULT: matches == ['youtube.com', 'videos', 'cats', 'google.com', 'videos', 'cats']
Upvotes: 0
Reputation:
If you want to use re.sub to remove the text after the 2nd slash then perhaps the following will help.
import re
data = '''\
/youtube.com/videos/cats
/google.com/images/dogs
'''
pattern = re.compile(r'^(/[^/]+/).+?$', re.MULTILINE)
print(pattern.sub(r'\1', data))
Upvotes: 0
Reputation: 24691
The regex I provided in a comment will work. By matching the start of the string with re.match()
, you can extract the area that was matched as a group.
>>> your_string = '/google.com/images/dogs'
>>> import re
>>> re.match(r'^/[^/]*/', your_string).group(0)
'/google.com/'
Here's how the regex is laid out:
^
start of string/
a slash character[^/]*
any number of characters that are not slashes/
another slash characterSo this regex will capture the first slash, the second slash, and the text in between them, as long as they come at the beginning of the string.
If you were to want the rest of the string, ignoring this first part, you could just add a capture group afterwards and pull group 1 (the first captured group) instead of 0 (the entire match):
>>> re.match(r'^/[^/]*/(.*)$', your_string).group(1)
'images/dogs'
Upvotes: 2