domjanzsoo
domjanzsoo

Reputation: 141

Extract content of code which start with a curly bracket and ends with a curly bracket followed by closing parenthesis

I'm completely mess with Regular Expressions right now(lack of practice). I'm writing a node script, which goes through a bunch of js files, each file calls a function, with one of the arguments being a json. The aim is to get all those json arguments and place them in one file. The problem I'm facing at the moment is the extraction of the argument part of the code, here is the function call part of that string:

$translateProvider.translations('de', {
        WASTE_MANAGEMENT: 'Abfallmanagement',
        WASTE_TYPE_LIST: 'Abfallarten',
        WASTE_ENTRY_LIST: 'Abfalleinträge',
        WASTE_TYPE: 'Abfallart',
        TREATMENT_TYPE: 'Behandlungsart',
        TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
        DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
        TREATMENT_TYPE_LIST: 'Behandlungsarten',
        TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
        TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
        SITE_TARGET: 'Gebäudeziel',
        WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
        WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
        WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
        WASTE_TYPE_ADD: 'Abfallart hinzufügen',
        UNIT_ADD: 'Einheit hinzufügen'
})

So I'm trying to write a regular expression which matches the segment of the js code, which starts with "'de', {" and ends with "})", while it can have any characters between(single/double curly brackets included). I tried something like this \'de'\s*,\s*{([^}]*)})\ , but that doesn't work. The furthest I got was with this \'de'\s*,\s*{([^})]*)}\ , but this ends at the first closing curly bracket within the json, which is not what I want. It seems, that even the concepts of regular exressions I understood before, now I completely forgot. Any is help is much appreciated.

Upvotes: 0

Views: 144

Answers (2)

Peter Thoeny
Peter Thoeny

Reputation: 7616

You did not state the desired output. Here is a solution that parses the text, and creates an array of arrays. You can easily transform that to a desired output.

const input = `$translateProvider.translations('de', {
        WASTE_MANAGEMENT: 'Abfallmanagement',
        WASTE_TYPE_LIST: 'Abfallarten',
        WASTE_ENTRY_LIST: 'Abfalleinträge',
        WASTE_TYPE: 'Abfallart',
        TREATMENT_TYPE: 'Behandlungsart',
        TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
        DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
        TREATMENT_TYPE_LIST: 'Behandlungsarten',
        TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
        TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
        SITE_TARGET: 'Gebäudeziel',
        WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
        WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
        WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
        WASTE_TYPE_ADD: 'Abfallart hinzufügen',
        UNIT_ADD: 'Einheit hinzufügen'
})`;

const regex1 = /\.translations\([^{]*\{\s+(.*?)\s*\}\)/s;
const regex2 = /',[\r\n]+\s*/;
const regex3 = /: +'/;
let result = [];
let m = input.match(regex1);
if(m) {
  result = m[1].split(regex2).map(line => line.split(regex3));
}
console.log(result);

Explanation of regex1:

  • \.translations\( -- literal .translations(
  • [^{]* -- anything not {
  • \{\s+ -- { and all whitespace
  • (.*?) -- capture group 1 with non-greedy scan up to:
  • \s*\}\) -- whitespace, followed by })
  • s flag to make . match newlines

Explanation of regex2:

  • ',[\r\n]+\s* -- ',, followed by newlines and space (to split lines)

Explanation of regex3:

  • : +' -- literal : ' (to split key/value)

Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Upvotes: 2

damonholden
damonholden

Reputation: 1172

This can be done with lookahead, lookbehind, and boundary-type assertions:

/(?<=^\$translateProvider\.translations\('de', {)[\s\S]*(?=}\)$)/
  • (?<=^\$translateProvider\.translations\('de', {) is a lookbehind assertion that checks for '$translateProvider.translations('de', {' at the beginning of the string.
  • (?=}\)$) is a lookahead assertion that checks for '})' at the end of the string.
  • [\s\S]* is a character class that matches any sequence of space and non-space characters between the two assertions.

Here is the regex101 link for you to test

Hope this helps.

Upvotes: 1

Related Questions