Garfield910
Garfield910

Reputation: 13

Match between simple delimiters, but not delimiters themselves

I was looking at JSON data that was just in a text file. I don't want to do anything aside from just use regex to get the values in between quotes. I'm just using this as a way to help practice regex and got to this point that seems like it should be simple, but it turns out it's not (at least to me and a few other people at the office). I've matched complicated urls with ease in regex so I'm not completely new to regex. This just seems like a weird case for me.

I've tried:

/(?:")(.*?)(?:")/

/"(.*?)"/

and several others but these got me the closest.

Basically we can forget that it's JSON and just say I want to match the words value and stuff out of "value" and "stuff". Everything I try includes the quotes, so I'd have to clean the strings afterwards of the delimiters or else the string is literally "value" with the quotes.

Any help would be much appreciated, whether this is simple or complicated, I'd love to know! Thanks

Update: Alright so I think I'll go with (?<=")(.*?)(?=") and read things by line without the global setting on so I just get the first match on each line. In my code I was just plopping in a huge string into a var in the code instead of actually opening a file with ajax/filereader or having a form setup to input data. I think I'll mark this as solved, much appreciated!

Upvotes: 0

Views: 68

Answers (2)

joanis
joanis

Reputation: 12290

You have two choices to solve this problem:

Use capturing groups

You can match the delimiters and use capturing groups to get the text within. In this case your two regexes will work, but you need to use access capturing group 1 to get the results (demo). See How do you access the matched groups in a JavaScript regular expression? for how to do that.

Use zero-width assertions

You can use zero-width assertions to match only the text within, require delimiters around them without actually matching them (demo):

(?<=")(.*?)(?=")

but now since I'm not consuming the quotes it'll find instances between each quote, not just between pairs of quotes: e.g., a"b"c" would find b and c.

As for getting just the first match, I think that'll happen by default in JavaScript. You'd have to ask for repeated matching before you see the subsequent ones. So if you process your file one line at a time, you should get what you want.

Upvotes: 1

ctaleck
ctaleck

Reputation: 1665

get the values in between quotes

One thing to keep in mind is that valid JSON accepts escaped quotes inside the quoted values. Therefore, the RegEx should take this into account when capturing the groups which is done with the “unrolling-the-loop” pattern.

var pattern = /"[^"\\]*(?:\\.[^"\\]*)*"/g;
var data = {
  "value": "This is \"stuff\".",
  "empty": "",
  "null": null,
  "number": 50
};
var dataString = JSON.stringify(data);
console.log(dataString);
var matched = dataString.match(pattern);
matched.map(item => console.log(JSON.parse(item)));

Upvotes: 0

Related Questions