Juan Diego
Juan Diego

Reputation: 55

Powershell ConvertTo-Json problem with double-quotation marks

I am trying to convert a text file to a JSON-formatted string, but the double-quotation marks are not correctly positioned.

My file.txt contains the following structured information (two empty lines at the beginning as well):

adapter_name : empty1 route_age : 10 route_nexthop : 172.0.0.1 route_protocol : NETMGMT1 speed : null

adapter_name : empty2 route_age : 100 route_nexthop : 172.0.0.2 route_protocol : NETMGMT2 speed : null

adapter_name : empty3 route_age : 1000 route_nexthop : 172.0.0.3 route_protocol : NETMGMT3 speed : null

My code is:

$data = Get-Content C:\scripts\file.txt | %{$_.PSObject.BaseObject}
$data | ConvertTo-Json 

Without this part:

%{$_.PSObject.BaseObject}
It's just descending very deeply into the object tree which could take a long time.

The actual result is:


    [
        "",
        "",
        "adapter_name          : empty1",
        "route_age             : 10",
        "route_nexthop         : 172.0.0.1",
        "route_protocol        : NETMGMT1",
        "speed                 : null "
        "",
        "adapter_name          : empty2",
        "route_age             : 100",
        "route_nexthop         : 172.0.0.2",
        "route_protocol        : NETMGMT2",
        "speed                 : null "
        "",
        "adapter_name          : empty3",
        "route_age             : 1000",
        "route_nexthop         : 172.0.0.3",
        "route_protocol        : NETMGMT3",
        "speed                 : null "
    ]

And the expected result is:

[
  {
    "adapter_name"         : "empty1",
    "route_age"            : 10,
    "route_nexthop"        : "172.0.0.1",
    "route_protocol"       : "NETMGMT1",
    "speed"                : null
  },
  {
    "adapter_name"         : "empty2",
    "route_age"            : 100,
    "route_nexthop"        : "172.0.0.2",
    "route_protocol"       : "NETMGMT2",
    "speed"                : null
  },
  {
    "adapter_name"         : "empty3",
    "route_age"            : 1000,
    "route_nexthop"        : "172.0.0.3",
    "route_protocol"       : "NETMGMT3",
    "speed"                : null
  }
]

The examples 4 and 5 in the link https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/convertto-json?view=powershell-6 show how to use the cmdlet ConvertoTo-Json with a similar situation, but without problems.

Upvotes: 3

Views: 4004

Answers (1)

mklement0
mklement0

Reputation: 440102

Get-Content merely returns the individual lines from a text file, it knows nothing about any structure that may be encoded in those lines.

Therefore, you're just converting the lines as-is to JSON, which results in the flat list of JSON string values you're seeing.

To fix that problem, you must perform your own parsing of the text file into structured objects (hash tables), block by block, and pass those to ConvertTo-Json:

# Read the input file as a whole (-Raw) and split it into blocks (paragraphs)
(Get-Content -Raw C:\scripts\file.txt) -split '\r?\n\r?\n' -ne '' |
  ForEach-Object { # Process each block
    # Initialize an ordered hashtable for the key-values pairs in this block.
    $oht = [ordered] @{}
    # Loop over the block's lines.
    foreach ($line in $_ -split '\r?\n' -ne '') {
      # Split the line into key and value...
      $key, $val = $line -split ':', 2
      # ... and add them to the hashtable.
      $oht[$key.Trim()] = $val.Trim()
    }
    $oht # Output the hashtable.
  } | ConvertTo-Json

The above yields the desired output.


As an aside, re:

Without this part:
%{$_.PSObject.BaseObject}
It's just descending very deeply into the object tree which could take a long time.

The issue is that Get-Content decorates the lines it outputs with additional, normally invisibly properties that provide origin information, such as the path of the file the lines were read from.

These normally hidden properties surface unexpectedly in serialization scenarios, such as when ConvertTo-Json is used.

The solution above implicitly bypasses this problem, because new strings are being created during processing.

While the additional properties can be useful, they're frequently not only not needed, but also slow Get-Content down.

  • This GitHub issue proposes adding a switch to Get-Content that allows opting out of having the lines decorated (not implemented as of PowerShell Core 7.0.0-preview.3)

  • Complementarily, this GitHub issue proposes that PowerShell-added properties be ignored for types that correspond to primitive JSON types, which includes [string] (not implemented as of PowerShell Core 7.0.0-preview.3)

Upvotes: 1

Related Questions