Reputation: 157
I'm using NiFi and I have a series of JSONs that look like this:
{
"url": "RETURNED URL",
"repository_url": "RETURNED URL",
"labels_url": "RETURNED URL",
"comments_url": "RETURNED URL",
"events_url": "RETURNED URL",
"html_url": "RETURNED URL",
"id": "RETURNED_ID",
"node_id": "RETURNED id",
"number": 10,
...
"author_association": "xxxx",
"active_lock_reason": null,
"body": "text text text, text text, text text text, text, text text",
"performed_via_github_app": null
}
My focus is on the "body" attribute. Because I'm merging them into one giant JSON to convert into a csv, I need the commas within the "body" text to go away (to help with possible NLP later down the road as well). I know I can just use the replace text, but capturing the commas themselves is the part I'm struggling with. So far I have the following:
((?<="body"\s:\s").*(?=",))
Every guide I look at, though, doesn't match the commas within the quotes. Any suggestions?
Upvotes: 1
Views: 105
Reputation: 626861
You can use
(\G(?!^)|\"body\"\s*:\s*\")([^\",]*),
In case there are escape sequences in the string use
(\G(?!^)|\"body\"\s*:\s*\")([^\",\\]*(?:\\.[^\",\\]*)*),
See the regex demo (and regex demo #2), replace with $1$2
.
Details:
(\G(?!^)|\"body\"\s*:\s*\")
- Group 1: end of the previous match or "body"
, zero or more whitespaces, :
, zero or more whitespaces([^\",]*)
- Group 2 ($2
): any zero or more chars other than "
and ,
,
- a comma (to be removed/replaced).Upvotes: 1