Uden VH
Uden VH

Reputation: 177

How can I unzip a base64 encoded string in R?

Goal

Goal is to make configuration and code readable after it has been exported from an application that stores this data in base64 encoded and gzip-ped format.

Test in Linux-shell

Example of a string with code

"H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDuHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

Decoded and gunzip-ped in a Linux shell with the command:

echo $1 | base64 -d | gunzip -c

Which results in:

plugin_applies_if_config<split>plugin_config=<?xml version="1.0" encoding="UTF-8"?>
<BusinessRule>
  <BusinessPlugin BusinessRulePluginID="JavaScriptBusinessConditionWithBinds">
    <Parameters>
      <Parameter ID="Binds" Type="java.lang.String">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;BindMap/&gt;
</Parameter>
      <Parameter ID="ErrorMessages" Type="java.lang.String"></Parameter>
      <Parameter ID="JavaScript" Type="java.lang.String">return false;</Parameter>
    </Parameters>
  </BusinessPlugin>
</BusinessRule>
<split>

Task accomplished. ...almost.

Turn into R-script

As i have several hundred of these strings, i want to perform similar commands as in the Linux shell in a script. And because i only know some R, i tried using R. I succesfully extracted the strings from the XML-document that was exported from the application and turned these in a data frame with columns id, name and code.

The following is a simplified example where i try to reproduce the Linux commands step by step.

encoded = "H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDutBhDERcHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

decoded = base64enc::base64decode(what=encoded)
# decoded = openssl::base64_decode(encoded)
# decoded = jsonlite::base64_dec(encoded)
# 3 times the same result

str(decoded)
# an array of raw-types. Maybe i need to convert to a string?
paste(decoded, collapse = "")

Doesn't look like the base64 decoded data in the Linux shell, but let's try to unzip...

decompressed <- 
  tryCatch({  
    memDecompress(from = paste(decoded, collapse = ""),
                  type = "gzip",
                  asChar = TRUE)
  },
  error = function(cond) {
    message(cond)
    return(NA)
  })
# fails with "internal error -3 in memDecompress(2)" 
(decompressed)

Clearly the input for 'gzip' is not what it expects. It must be some sort of binary string.

But how to get there? What am i doing wrong? Thanks for your advise!

Upvotes: 6

Views: 2909

Answers (1)

MrFlick
MrFlick

Reputation: 206401

The memDecompress function was improved in R version 4.0.0 to work properly. You should now be able to do

memDecompress(base64enc::base64decode(what=encoded), "gzip", asChar=TRUE)

Previous versions were troublesome because they ignored standard headers. Here's a word around for older versions of R. Basically we create a raw stream of bytes and then use gzcon to decompress them

con <- rawConnection(base64enc::base64decode(what=encoded))
readLines(gzcon(con))
close(con)

You will get a warning that there is an "incomplete final line" but that's just because it looks like there wasn't a new line at the end of the file. The data seems fine otherwise.

Upvotes: 8

Related Questions