leolapa
leolapa

Reputation: 43

Figure out character showing as space on Power Query

After running an API GET request method I get a set of data whose snippet is shown by the below screenshot:

enter image description here

However when I run a Merge Queries of the Contract column with another table so I can get each correspondent translation to Portuguese it'll only match a few items, as shown below:

enter image description here

After checking the other table's contents every correspondent Contract was there, but then after copying the entire Contract column from the API dataset and pasting it to a blank Excel spreadsheet I realized that the spaces displayed on Power Query are actually some unrecognizable characters displayed as "?" inside a diamond:

enter image description here

I've already tried the Clean tool but it won't work. I also tried other think such as Trim and Replace Values... space by something else and no glory.

Something tells me I need to figure out which character is actually being imported instead of space so I can try to replace it on Power Query, but the question is: how can I do that? How can I turn those "?" into readable characters in Excel?!

Leonardo

Upvotes: 0

Views: 1040

Answers (2)

ninMonkey
ninMonkey

Reputation: 7511

What is ?

There's a special character named Replacement Char which is the codepoint u+fffd or in power query: #(fffd)

If you see one, it means text is either encoded or decoded using the wrong encoding, which causes errors.

enter image description here

Check your encoding, the UI picks the wrong encoding

enter image description here

You say it's a web api, so its almost always UTF-8. However the default UI doesn't usually pick that.

  • The UI often picks 1252 which is TextEncoding.Windows
  • The web almost always uses 65001 which is TextEncoding.Utf8

Wait, your query ended up with ?

This is why I used the term code-unit instead of code-point. Text.ToList converts text into a list of 2-byte [char]s.

How do I enumerate characters in a string?

= Text.ToList("hi world")

How do I tell what a character is?

= Character.ToNumber("x")

https://learn.microsoft.com/en-us/powerquery-m/character-tonumber

Query

let
    // helper function, 
    Csv = (source as list) as text =>
        Text.Combine(
            List.Transform( source, (item) =>
                try Text.From(item)
                catch (e) => "<error>"//  e[Message]
            ),
            ", "
        ),

    ReplaceSymbol = "#(fffd)",
    Source = "hi world #(0001f412)",

    Letters = Text.ToList( Source ),
    CodeUnits = List.Transform( Letters,
        (char) =>
            try Character.ToNumber( char )
            catch (e) => "invalid" meta [ Char = char ]
    ),
    CodeUnits2 = List.Transform( Letters,
        (char) =>
            try Character.ToNumber( char )
            catch (e) => ReplaceSymbol meta [ Char = char ]
    ),

    Summary = [
        Source = Source,
        CodeUnit = Csv( CodeUnits ),
        CodeUnit2 = Csv( CodeUnits2 ),
        Letters = Csv( Letters )
    ]
in
    Summary

Upvotes: 1

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

try something like:

=UNICODE(MID(A4, SEQUENCE(,LEN(A4)),1))

and fill down. You should see codes for all the characters in the adjacent row. (Be sure the row is clear or you will the #SPILL! error)

Upvotes: 2

Related Questions