Reputation: 177
Goal is to make configuration and code readable after it has been exported from an application that stores this data in base64 encoded and gzip-ped format.
Example of a string with code
"H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDuHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="
Decoded and gunzip-ped in a Linux shell with the command:
echo $1 | base64 -d | gunzip -c
Which results in:
plugin_applies_if_config<split>plugin_config=<?xml version="1.0" encoding="UTF-8"?>
<BusinessRule>
<BusinessPlugin BusinessRulePluginID="JavaScriptBusinessConditionWithBinds">
<Parameters>
<Parameter ID="Binds" Type="java.lang.String"><?xml version="1.0" encoding="UTF-8"?>
<BindMap/>
</Parameter>
<Parameter ID="ErrorMessages" Type="java.lang.String"></Parameter>
<Parameter ID="JavaScript" Type="java.lang.String">return false;</Parameter>
</Parameters>
</BusinessPlugin>
</BusinessRule>
<split>
Task accomplished. ...almost.
As i have several hundred of these strings, i want to perform similar commands as in the Linux shell in a script. And because i only know some R, i tried using R. I succesfully extracted the strings from the XML-document that was exported from the application and turned these in a data frame with columns id, name and code.
The following is a simplified example where i try to reproduce the Linux commands step by step.
encoded = "H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDutBhDERcHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="
decoded = base64enc::base64decode(what=encoded)
# decoded = openssl::base64_decode(encoded)
# decoded = jsonlite::base64_dec(encoded)
# 3 times the same result
str(decoded)
# an array of raw-types. Maybe i need to convert to a string?
paste(decoded, collapse = "")
Doesn't look like the base64 decoded data in the Linux shell, but let's try to unzip...
decompressed <-
tryCatch({
memDecompress(from = paste(decoded, collapse = ""),
type = "gzip",
asChar = TRUE)
},
error = function(cond) {
message(cond)
return(NA)
})
# fails with "internal error -3 in memDecompress(2)"
(decompressed)
Clearly the input for 'gzip' is not what it expects. It must be some sort of binary string.
But how to get there? What am i doing wrong? Thanks for your advise!
Upvotes: 6
Views: 2909
Reputation: 206401
The memDecompress
function was improved in R version 4.0.0 to work properly. You should now be able to do
memDecompress(base64enc::base64decode(what=encoded), "gzip", asChar=TRUE)
Previous versions were troublesome because they ignored standard headers. Here's a word around for older versions of R. Basically we create a raw stream of bytes and then use gzcon
to decompress them
con <- rawConnection(base64enc::base64decode(what=encoded))
readLines(gzcon(con))
close(con)
You will get a warning that there is an "incomplete final line" but that's just because it looks like there wasn't a new line at the end of the file. The data seems fine otherwise.
Upvotes: 8