ArtW
ArtW

Reputation: 63

My regular expression isn't returning what I need

I have a block of text as such.

google.sbox.p50 && google.sbox.p50(["how to",[["how to tie a tie",0],["how to train your dragon 2 trailer",0],["how to do the cup song",0],["how to get a six pack in 3 minutes",0],["how to make a paper gun that shoots",0],["how to basic",0],["how to love lil wayne",0],["how to sing like your favorite artist",0],["how to be a heartbreaker marina and the diamonds",0],["how to tame a horse in minecraft",0]],{"q":"XJW--0IKH6sqOp0ME-x5B7b_5wY","j":"5","k":1}])

Using \\[([^]]+)\\] I am able to get everything I need, but with a little extra that I don't. I do not need the ["how to",[[. I only need the blocks that are formatted like,

["how to tie a tie",0]

Can someone please help me modify my expression to only get what I need? I've been at it for hours and I can't grasp the idea of RegEx.

Upvotes: 3

Views: 297

Answers (5)

Smern
Smern

Reputation: 19066

I think this is what you are looking for to match the format of ["how to tie a tie",0]:

(\["[^"]+",\d\])

( ) - around the whole thing so it all gets captured in this group
\[" - find ["
[^"]+ - find one or more of anything except "
", - find ",
\d - find a number, if you want more than just a single digit, do \d+
\] - match the ending ]

The only variable things in this regex are whatever is within the quotes ([^"]+) and the number (\d+).

Demo

If you don't want the square brackets in the capture group, you can do it like this:

\[("[^"]+",\d+)\]

I assume you don't want to match if there are quotes within your quotes as it would probably break whatever purpose you are using it for, but if you do, this should work:

\[("[^[\]]+",\d+)\]

Upvotes: 1

Jerry
Jerry

Reputation: 71538

Put both the opening and closing square brackets in the negated character class?

\\[([^][]+)\\]

\\[ matches a literal [

\\] matches a literal ]

[^][] is a negated class, which for instance matches any character except ][. It might be a little difficult to see it, but it's equivalent to [^\\]\\[]. Here the double escapes are not required because you are using a character class (just like \\. is equivalent to [.])

([^][]+) captures everything within square brackets, making sure there's no ] or [ inside.

In C#, you can use the @ symbol to avoid having to double escape everytime and using this makes the regex like that:

var regex = new Regex(@"\[([^][]+)\]");

Note: This regex will capture everything within square brackets. If you wish to specificly get the format ["how to tie a tie",0], you can be more precise. After all, the regex will only match stuff you make it match:

var regex = new Regex(@"\["[^"]+",0\]");

Here, we have another negated character class: [^"]. This will match any character which is not a quote character.

This one assumes that the digit is always 0, as depicted in your sample text block. If you have multiple possibilities of numbers, you can use the character class [0-9]+:

var regex = new Regex(@"\["[^"]+",[0-9]+\]");

You can use \d+ as well, but this character class also matches other characters which may or may not render the regex worse. If you want to be more even cautious by allowing possible spaces, tabs, newlines, form feeds in between the characters, you can use this regex:

var regex = new Regex(@"\[\s*"[^"]+"\s*,\s*[0-9]+\s*\]");

Conclusion, there might be many regexes which suit what you need, just make sure you know how your data is coming through so you can pick one which has the right amount of freeway.

Upvotes: 3

Csaba Toth
Csaba Toth

Reputation: 10697

Seemingly the text in the outer brackets is a JSON representation of an object. Instead of a regular expression I'd just:

  1. strip off the stuff before the bracket + first bracket (google.sbox.p50 && google.sbox.p50() plus strip off the trailing bracket ). There are more ways to do this, and it can be more efficient than regex.
  2. JSON parse the remaining inner part.
  3. From that point you have the object representation, you can leave out the first element of the array what you don't need, plus you have everything else in a traversable form.

There's the session information at the end along with parameters anyway (in {} brackets), so in the end you may end up parsing stuff anyway. Better not to reinvent the wheel (JSON parsing).

Upvotes: 0

Jeroen van Langen
Jeroen van Langen

Reputation: 22038

I think you need this one: (\[[^\[^]+?])

What you did mis is the ? (smallest match) and exclude any [ or ]

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You must use this pattern

@"\[[^][]+\]"

More informations about square brackets here.

Upvotes: 0

Related Questions