tim
tim

Reputation: 21

Trying to match what is before /../ but after / with regular expressions

I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /

I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:

this/is/a/./path/that/../includes/face/./stuff/../hat

and my regular expression is:

#\/(.*)\.\.\/#

matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../

How should I change my regex to make it work?

Upvotes: 2

Views: 199

Answers (7)

dawg
dawg

Reputation: 104082

([^/]+) will capture all the text between slashes.

([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.

You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1 (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^/]+                    any character except: '/' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of \1 (NOTE: because you're using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  \.                       '.'

Upvotes: 0

maček
maček

Reputation: 77806

Alternatively, you can use a lookahead.

#(\w+)(?=/\.\./)#

Explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    /                        '/'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    /                        '/'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 1

Dave Sherohman
Dave Sherohman

Reputation: 46225

.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.

Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)

[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.

Upvotes: 2

user297250
user297250

Reputation:

In python:

>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']

Or if you just want the text without the slashes:

>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']

Upvotes: 0

Gumbo
Gumbo

Reputation: 655677

Change your pattern that only characters other than / ([^/]) get matched:

#([^/]*)/\.\./#

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342899

In your favourite language, do a few splits and string manipulation eg Python

>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1]  # the last item is not required.
>>> for item in a:
...   print item.split("/")[-1]
...
that
stuff

Upvotes: 0

Michael Mrozek
Michael Mrozek

Reputation: 175675

I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#

Upvotes: 0

Related Questions