I have the following path stored as a python string 'C:\ABC\DEF\GHI\App\Module\feature\src' and I would like to extract the word Module that is located between words \App\ and \feature\ in the path name. Note that there are file separators '\' in between which ought not to be extracted, but only the string Module has to be extracted. I had the few ideas on how to do it: Write a RegEx that matches a string between \App\ and \feature\ Write a RegEx that matches a string after \App\ --> App\\[A-Za-z0-9]*\\ , and then split that matched string in order to find the Module . I think the 1st solution is better, but that unfortunately it goes over my RegEx knowledge and I am not sure how to do it. I would much appreciate any help. Thank you in advance!

The regex you want is: (?<=\\App\\).*?(?=\\feature\\) Explanation of the regex: (?<=behind)rest matches all instances of rest if there is behind immediately before it. It's called a positive lookbehind rest(?=ahead) matches all instances of rest where there is ahead immediately after it. This is a positive lookahead. \ is a reserved character in regex patterns, so to use them as part of the pattern itself, we have to escape it; hence, \\ .* matches any character, zero or more times. ? specifies that the match is not greedy (so we are implicitly assuming here that \feature\ only shows up once after \App\ ). The pattern in general also assumes that there are no \ characters between \App\ and \feature\ . The full code would be something like: str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src' start = '\\App\\' end = '\\feature\\' pattern = rf"(?<=\{start}\).*?(?=\{end}\)" print(pattern) # (?<=\\App\\).*?(?=\\feature\\) print(re.search(pattern, str)[0]) # Module A link on regex lookarounds that may be helpful: https://www.regular-expressions.info/lookaround.html

pythonregex

SimpleThings

Reputation: 167

Extracting a word between two path separators that comes after a specific word

I have the following path stored as a python string 'C:\ABC\DEF\GHI\App\Module\feature\src' and I would like to extract the word Module that is located between words \App\ and \feature\ in the path name. Note that there are file separators '\' in between which ought not to be extracted, but only the string Module has to be extracted.

I had the few ideas on how to do it:

Write a RegEx that matches a string between \App\ and \feature\
Write a RegEx that matches a string after \App\ --> App\\[A-Za-z0-9]*\\, and then split that matched string in order to find the Module.

I think the 1st solution is better, but that unfortunately it goes over my RegEx knowledge and I am not sure how to do it.

I would much appreciate any help.

Thank you in advance!

Upvotes: 0

Answers (3)

butterflyknife

Reputation: 1574

The regex you want is:

(?<=\\App\\).*?(?=\\feature\\)

Explanation of the regex:

(?<=behind)rest matches all instances of rest if there is behind immediately before it. It's called a positive lookbehind
rest(?=ahead) matches all instances of rest where there is ahead immediately after it. This is a positive lookahead.
\ is a reserved character in regex patterns, so to use them as part of the pattern itself, we have to escape it; hence, \\
.* matches any character, zero or more times.
? specifies that the match is not greedy (so we are implicitly assuming here that \feature\ only shows up once after \App\).
The pattern in general also assumes that there are no \ characters between \App\ and \feature\.

The full code would be something like:

str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
start = '\\App\\'
end = '\\feature\\'

pattern = rf"(?<=\{start}\).*?(?=\{end}\)"

print(pattern)                            # (?<=\\App\\).*?(?=\\feature\\)
print(re.search(pattern, str)[0])         # Module

A link on regex lookarounds that may be helpful: https://www.regular-expressions.info/lookaround.html

Upvotes: 3

schlin

Reputation: 369

Your are looking for groups. With some small modificatians you can extract only the part between App and Feature.

(?:App\\\\)([A-Za-z0-9]*)(?:\\\\feature)

The brackets ( ) define a Match group which you can get by match.group(1). Using (?:foo) defines a non-matching group, e.g. one that is not included in your result. Try the expression here: https://regex101.com/r/24mkLO/1

Upvotes: 2

Bhargav

Reputation: 4062

We can do that by str.find somethings like

str = 'C:\\ABC\\DEF\\GHI\\App\\Module\\feature\\src'
import re
start = '\\App\\'
end = '\\feature\\'

print( (str[str.find(start)+len(start):str.rfind(end)]))
print("\n")

output

Module

Upvotes: 2

Extracting a word between two path separators that comes after a specific word

Answers (3)

Related Questions