Leonardo Lana
Leonardo Lana

Reputation: 610

Regex remove json property

I'd like to remove a stringfied json's property based on its key wherever it is, whatever its value type is. But removing it only if its value is a string and its on the root level of the object would be nice for a beggining. I tried this:

[,]{1}[\s]*?\"attrName\"[ ]*?[:][ ]*?\".*\"[^,]|\"attrName\"[ ]*?[:][ ]*?\".*\"[,]{0,1}

Example : https://regex101.com/r/PAlqYi/1

but it looks a lot big to do such a simple job, what it does is ensure the comma will be removed as well, if attrName is the first attribute, the last ot something in the middle of the json three. Does anyone has a better idea to make this regex more readable?

Upvotes: 4

Views: 17076

Answers (3)

Tomoe
Tomoe

Reputation: 21

Corrected the previous two answers :D

All json syntax consists of quotes, colons and commas. We need to focus on these symbols.

First of all, we need an unescaped quote:

(?<!\\)(?:\\\\)*['"]

The object key in JSON is always a JSON string. A Json string has the following signature: any content wrapped in two identical unescaped quotes:

(?<!\\)(?:\\\\)*('|").*?(?<!\\)(?:\\\\)*\1

Now let's move on to the object: there are three signatures for an object property:

  1. { property: value } - property\s*:\s*value?=\s*\}

  2. { ..., property: value } - ,\s*property\s*:\s*value - comma at the beginning

  3. { property: value, ... } - property\s*:\s*value\s*, - comma at the end

Please note that all tokens can be separated by spaces and line breaks - \s*.

We combine all three cases and get the following expression:

(,\s*property\s*:\s*value)|(property\s*:\s*value(,|(?=\s*\})))

Now we substitute a certain property and value into this signature, where the property is the string attr, and the value is any string:

(,\s*(?<!\\)(?:\\\\)*('|")attr(?<!\\)(?:\\\\)*\2\s*:\s*(?<!\\)(?:\\\\)*('|").*?(?<!\\)(?:\\\\)*\3)|((?<!\\)(?:\\\\)*('|")attr(?<!\\)(?:\\\\)*\5\s*:\s*(?<!\\)(?:\\\\)*('|").*?(?<!\\)(?:\\\\)*\6(,|(?=\s*\})))

This solution will work exactly as you expect. The two previous answers work in a similar way, but contain many errors.

https://regex101.com/r/3GKXAs/1

Upvotes: 2

Alex Collins
Alex Collins

Reputation: 570

If you have any way of using a parser it's a more stable and readable solution. The regex \s*\"attr\" *: *\".*\"(,|(?=\s*\})) should be shorter and better.

Example

Several changes I made to help:

  1. Don't use so many character classes like [,]. If there is only one element in a character class it should be left by itself.
  2. Only use numbered counts when required. Ex: {0,1} is ? and {1} is pointless.
  3. Instead of searching for a comma in the previous line to see if it is the end of a list checking if there is a } following the line allows you to group the conditionals together.
  4. A positive lookahead is used at the end to search for } so it wouldn't be removed during the substitution.

Update with bugfix mentioned in comments. Trailing commas would be left if the attribute is last. The simplest way I found to fix this was to match both cases. So, you'll have to fill in attr twice.

(,\s*\"attr\" *: *\".*\"|(?=\s*\}))|(\s*\"attr\" *: *\".*\"(,|(?=\s*\})))

Examples with added tests cases

Upvotes: 8

Artem Tiumentcev
Artem Tiumentcev

Reputation: 41

I modified the regex from the first example, it works better even if is Flat JSON

\s*\"attr\" *: *(\"(.*?)\"(,|\s|)|\s*\{(.*?)\}(,|\s|))

Example

Upvotes: 4

Related Questions