DavidM
DavidM

Reputation: 227

Find strings between quotes, but before colon?

I've tested at all sorts of examples and non of them seem to work for me.

Given a string like this:

"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}

Is there a RegEx I can use to get just the strings within the double quotes that precede a colon? In this case I just want FOO,BAR,FIND,ME. No other values.

Thanks,

Upvotes: 2

Views: 885

Answers (5)

mklement0
mklement0

Reputation: 437978

If you must use a regex to solve this problem, here's a concise solution (PSv3+):

[regex]::Matches(
  '"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}',
  '(?<=")[^"]+(?=":)'
).Value   # -> 'FOO', 'BAR', 'FIND', 'ME'
  • Regex (?<=")[^"]+(?=":) matches the content of all "..."-enclosed tokens that are immediately followed by a :; the initial " and the following ":are not included in the match due to use of look-behind - (?<=...) - and look-ahead - (?=...) - assertions, and the content is matched by [^"]+, a nonempty run (+) of characters other than " ([^"]).

  • .Value extracts the matched string from each [System.Text.RegularExpressions.Match] instance returned by the (implicit) enumeration of the [System.Text.RegularExpressions.MatchCollection] collection returned by the [regex]::Matches() method.


If the input is (effectively) JSON, consider a ConvertFrom-Json-based approach, as in FoxDeploy's answer.

Here's a generalization of that approach, via a function that walks an arbitrary object graph to report a flat list of its property names with input from a (nested) custom object that ConvertFrom-Json created from JSON:

# Define the walker function:
# Get all property names in a given object's hierarchy.
function get-PropertyName {
  param([Parameter(ValueFromPipeline)] [object] $InputObject)
  process {   
    if ($null -eq $InputObject -or $InputObject -is [DbNull] -or $InputObject.GetType().IsPrimitive -or $InputObject.GetType() -in [string], [datetime], [datetimeoffset], [decimal], [bigint]) {
      # A null-like value or a primitive or quasi-primitive type:
      # No constituent properties to report.
    }
    elseif ($InputObject -is [System.Collections.IEnumerable] -and $InputObject -isnot [System.Collections.IDictionary]) {
      # A collection of sorts (other than a string or dictionary (hash table)), 
      # recurse on its elements.
      foreach ($o in $InputObject) { get-PropertyName $o }
    }
    else { 
      # A non-quasi-primitive scalar object or a dictionary:
      # enumerate its properties / entries.
      $props = if ($InputObject -is [System.Collections.IDictionary]) { $InputObject.GetEnumerator() } else { $InputObject.psobject.properties }
      foreach ($p in $props) {
        $p.Name
        # Recurse
        get-PropertyName $p.Value
      }
    }
  }
}

# Make the input string a JSON string.
$str = '"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}'
$json = '{' + $str + '}'

# Convert the JSON string to a custom-object hierarchy and let the walker
# function enumerate all of its properties as a flat list.
$json | ConvertFrom-Json | get-PropertyName # -> 'FOO', 'BAR', 'FIND', 'ME'

Caveat: The get-PropertyName function doesn't check for circular references between properties, which could result in an infinite recursion; with JSON as the input, that isn't a concern, however.

Upvotes: 1

Lee_Dailey
Lee_Dailey

Reputation: 7479

this is likely too fragile, but it does work with the limited sample you provided ... [grin]

$InStuff = '"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}'

$InStuff -split ':' |
    Where-Object {$_ -match '"$'} |
    ForEach-Object {
        if ($_ -match ',')
            {
            $_.Split(',')[1]
            }
            else
            {
            $_
            }} |
    ForEach-Object {$_.Trim('{"')}

output ...

FOO
BAR
FIND
ME

Upvotes: 0

Walter Tross
Walter Tross

Reputation: 12624

A quick look at some examples tells me that in PowerShell you can do something like this:

$results = $data | Select-String '"([^"]+)":' -AllMatches

which should store the group number 1 (the first and only parentheses) in

$results[0].Groups[1].Value
$results[1].Groups[1].Value
...

[^"] matches any non-double-quote character

Upvotes: 1

Jacob
Jacob

Reputation: 1192

The regex [A-Z0-9]+(?=\"(?=:)) will work for the sample string provided.

$string = '"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}'
$matches = ([regex]'[A-Z0-9]+(?=\"(?=:))').Matches($string)
$matches.value

$matches.value returns (as an array):

FOO
BAR
FIND
ME

Regex explanation

[A-Z0-9] - Match any of the characters between A-Z and 0-9

+ - Match as many times as possible

(?=...) - Positive lookahead, only matches if the specific character in the lookahead follows.

\" - match the character " literally

: - Match the character : literally.

Put it together, [A-Z0-9]+ matches A-Z0-9 as many times as possible and (?=\"(?=:)) matches only when it is followed by " and then :.

More info on the regex: https://regex101.com/r/PsKS6N/1/

Upvotes: 1

FoxDeploy
FoxDeploy

Reputation: 13537

Your input object is just one missing { from being valid JSON, so if we convert it like so...

$json = '{"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}}'

We can then retrieve the name of the parent node (FOO), and the name's of each of the properties as well.

$json = '{"FOO":{"BAR":"0x507A","FIND":"DONTFINDME","ME":["0x3214"]}}'
$objects = ConvertFrom-Json $json

($objects.PSobject.Properties) | % {
    $parent = $_.Name
    $childrenNames = $_.Value.PSObject.Properties | % {$_.Name}
    }
[pscustomobject]@{ParentColumn=$parent;ChildrensNames=$childrenNames}


ParentColumn ChildrensNames 
------------ -------------- 
FOO          {BAR, FIND, ME}

Let me know if this gets you moving in the right direction :)

Upvotes: 3

Related Questions