Solaflex
Solaflex

Reputation: 1432

Powershell Invoke-WebRequest and character encoding

I am trying to get information from the Spotify database through their Web API. However, I'm facing issues with accented vowels (ä,ö,ü etc.)

Lets take Tiësto as an example. Spotify's API Browser can display the information correctly: https://developer.spotify.com/web-api/console/get-artist/?id=2o5jDhtHVPhrJdv3cEQ99Z

If I make a API call with Invoke-Webrequest I get

Ti??sto

as name:

function Get-Artist {
param($ArtistID = '2o5jDhtHVPhrJdv3cEQ99Z',
      $AccessToken = 'MyAccessToken')


$URI = "https://api.spotify.com/v1/artists/{0}" -f $ArtistID

$JSON = Invoke-WebRequest -Uri $URI -Headers @{"Authorization"= ('Bearer  ' + $AccessToken)} 
$JSON = $JSON | ConvertFrom-Json
return $JSON
}

enter image description here

How can I get the correct name?

Upvotes: 2

Views: 13874

Answers (4)

Rogerio
Rogerio

Reputation: 1

It works for me:

$urlGeral = "https://your_site_with_json_file/json/File.json"

$objGeral = Invoke-WebRequest -UseBasicParsing -Method Get -Uri $urlGeral

$objGeral = [System.Text.Encoding]::UTF8.GetString([System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($objGeral.Content))

$JsonGERAL = $objGeral | ConvertFrom-Json

Upvotes: 0

mklement0
mklement0

Reputation: 437373

Update: PowerShell (Core) 7.0+ now defaults to UTF-8 for JSON, and in 7.4+ to UTF-8 in general in the absence of a (valid) charset attribute in the HTTP response header, so the problem no longer arises there.


Jeroen Mostert, in a comment on the question, explains the problem well:

The problem is that Spotify is (unwisely) not returning the encoding it's using in its headers. PowerShell obeys the [now obsolete] standard by assuming ISO-8859-1, but unfortunately the site is using UTF-8. (PowerShell ought to ignore standards here and assume UTF-8, but that's just like, my opinion, man.) More details here, along with the follow-up ticket.

A workaround that doesn't require the use of temporary files:

Manually decode the raw byte stream of the response as UTF-8:

$JSON = 
  [Text.Encoding]::UTF8.GetString(
    (Invoke-WebRequest -Uri $URI ...).RawContentStream.ToArray()
  )

Alternatively, use convenience function ConvertTo-BodyWithEncoding; assuming it has been defined (see below), you can more simply use the following:

# ConvertTo-BodyWithEncoding defaults to UTF-8.
$JSON = Invoke-WebRequest -Uri $URI ... | ConvertTo-BodyWithEncoding

Convenience function ConvertTo-BodyWithEncoding:

Note:

  • The function manually decodes the raw bytes that make up the given response's body, as UTF-8 by default, or with the given encoding, which can be specified as a [System.Text.Encoding] instance, a code-page number (e.g. 1251), or an encoding name (e.g., 'utf-16le').

  • The function is also available as an MIT-licensed Gist, and only the latter will be maintained going forward. Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can define it directly as follows (instructions for how to make the function available in future sessions or to convert it to a script will be displayed):

    irm https://gist.github.com/mklement0/209a9506b8ba32246f95d1cc238d564d/raw/ConvertTo-BodyWithEncoding.ps1 | iex
    
function ConvertTo-BodyWithEncoding {

  [CmdletBinding(PositionalBinding=$false)]
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    [Microsoft.PowerShell.Commands.WebResponseObject] $InputObject,
    # The encoding to use; defaults to UTF-8
    [Parameter(Position=0)]
    $Encoding = [System.Text.Encoding]::Utf8
  )

  begin {
    if ($Encoding -isnot [System.Text.Encoding]) {
      try {
        $Encoding = [System.Text.Encoding]::GetEncoding($Encoding)
      }
      catch { 
        throw
      }
    }
  }

  process {
    $Encoding.GetString(
       $InputObject.RawContentStream.ToArray()
    )
  }

}

Upvotes: 9

Paul Dwyer
Paul Dwyer

Reputation: 1

Have you tried something like

$output = [System.Text.Encoding]::UTF8.GetString([System.Text.Encoding]::GetEncoding("iso-8859-1").GetBytes($JSON.Name))

I use this line that I found somewhere to convert API return text to UTF-8. I'm not quite sure why this is needed since JSON is supposed to be UTF-8 I believe.

Upvotes: 0

Solaflex
Solaflex

Reputation: 1432

Issue solved with the workaround provided by Jeron Mostert. You have to save it in a file and explicit tell Powershell which Encoding it should use. This workaround works for me because my program can take whatever time it needs (regarding read/write IO)

function Invoke-SpotifyAPICall {
param($URI,
      $Header = $null,
      $Body = $null
      )

if($Header -eq $null) {
    Invoke-WebRequest -Uri $URI -Body $Body -OutFile ".\SpotifyAPICallResult.txt"    
} elseif($Body -eq $null) {
    Invoke-WebRequest -Uri $URI -Headers $Header -OutFile ".\SpotifyAPICallResult.txt"
}

$JSON = Get-Content ".\SpotifyAPICallResult.txt" -Encoding UTF8 -Raw | ConvertFrom-JSON
Remove-Item ".\SpotifyAPICallResult.txt" -Force
return $JSON

}

function Get-Artist {
    param($ArtistID = '2o5jDhtHVPhrJdv3cEQ99Z',
          $AccessToken = 'MyAccessToken')


    $URI = "https://api.spotify.com/v1/artists/{0}" -f $ArtistID

    return (Invoke-SpotifyAPICall -URI $URI -Header @{"Authorization"= ('Bearer  ' + $AccessToken)})
}


Get-Artist

Upvotes: 2

Related Questions