Stephen Ellis
Stephen Ellis

Reputation: 2701

How to identify unknown compression algorithm?

I'm trying to parse some text information from within project files of an Adobe program (Adobe Premiere Pro). The project files are gzip compressed XML files. These then contain fields which have base64 encoded fields which contain further information about a component of the project. I'm trying to look at the information within these fields.

I can't for the life of me identify the compression scheme used here. It doesn't look like GZip as it doesn't have the appropriate headers. Can anyone help?

An example of a base64 encoded field is:

AQAAAAAAAACk1QAAAAAAAENvbXByZXNzZWRUaXRsZQB4nO0da1MbSa5/iuu+L34Eg6maY4sQ2OWWhBRmk/hTytiGeM+xfbZJYH/83enRPf2cGfMIMyZdFOCR1Gp1t6TWSOPp//03Eb+KW/FVTERNfBMjsRBLMRYzMRX/FP8QTbElGvC/BpipGAB8CNipuCbsn+JCHItfgGqHaH4V+yIRB0AzE5fQ4rM4h08zsfLgF8BlBX2OCPPB6ndftKDPhujA7y78toCi7tEk4i1x5uvPogv4FV2jdEuieA899MUdYI+BaiG+w9UCpNgHuoW4gRbIOZsqAa4ruloBVvUzgOsRzce+uALsBHpjTkXUKNEinZcJ/OzD+FgGF56II+C1JOgJtWZKF4p0U5AWe7yCn3wJ82lRvhnQroIS+hhTxjOYz1VASgVHaPGK1XO0BMe7JHnHhF+IU1q3GfWwBP37kKm/2N6kdq+PaD5GwH1F/VwBPMSvJXmZ9BfAZU4yqrGHcCw/Ws/AWxUTg9dFkrnSv4G2X2kNlax5ss/FJ1jNE/h7BFy6AGuKPbI57NvHYosefD4R7wDG9GydTG/jkLorDkFLjuDnHaznObVgWh+TwNovQMbvMI4vNJKFHK2ydIU/B8g10OB48XpKPItau/Ppz5U7mweklaxleD2SutkFGlwJrZUH1FufsHh9JS1gS45XwxL4O4Pxm1gFSagvHhnj9whvQhOQZUZyzUB6k8qGsxfKk1NhD4xZyh5HQ7RzR6LweWNprzmadsF4siSur7V6r4FmIP4t9eWGLC5kJdvSSrogax9oEH9Oox9Rn7gLoj1vg9R7YAUsYT5tAl7Q98VnUvfmAEUfqDzhOVzjf2w5pR3pK7UzR8nWlN0O8aERI+YNSdgHyATa4p51l/rYopGGR1fP5WnjLuDqluxwlHrzuWWvTyNDcT+/AawPsC/kec9AX/6SvsPmxPozovk7pBbaqy/Ff4CqT7KgDGGqBDjPaJUU7BA4z0gTcE30+iCPfFqf1xn9nRby0XRo5YgZiY/wf0i+cz+1dB+TgAaOyZZwzx3LK9UijEvAJ49JlteA+xs+s6W5VpBFlRijOSQrmBLuGGgmpFu8Mq8gRtwCSWpP9Il91To9J6ArE9KqOa3Nc0u5fu9IfT9tV9ZzSJ4NVyhEZXontXf00zsEFQ+5UPb+t2QlU5IL929XL8I0qFUzis4/yvHYsbyPTch60A/iLJ1K7V5CfOP2mE2XxaO3Jo+evP+Z0B44lfEHWonaVzny9fEJ3VFcUuSM+4vpWT/KsfL+m0WVkE3OifeAbPOCaLu0fwzSVWnSPRfyWo8+i6/2eubYimjVOJW22leunrFuvNtw7TNbPkz3ijlEzXuc5oU17TKd7WrooLujRi18aVqYpW9LyXUquWye/sU9eDN00NW0+poxYv59WDhrp+7B89vWAneC3BrvFXdSHjy+Lv0dSKsZw73GiOLyHTnjGobzcCt+h886Y7ENkTfPiYth6j9ICp2ltWFob5wPVTPYpThc5VnY2rIp1rdr7Lcr7z9x/nfT/IwNdylD+YkQhWo3Ia+wssZrQpnuT7qHwlnQmuP3EaLi9uY9qLkC9r0pt2J7932BiUsAO4Y7TcxfD1PZbRjTnJD2s62YdDacaa9I3yfgm/vkNfYpOzUmuregf6qtT6faz2jWjgmOd253QV6aTx69zdP0Qmgt1yQD5itcbra/MjHaQ7S9Vqb3uKJZ6VMu4Jr84h21QRvrUCVnDzxcK+Xh07JX8e21XuAJHusrdqOviL4i+oroK9bwFZ3oK6KviL7ip/EVRTSqNqxG6V7XYG3fBO5LNMVraaF5d0IrY64a8oe1Vs/TnKpE41SyN1RDRT3tC1UnUzX7YkpdI+LKAXoDdUccsoUQXUJPjPAo2afup7PAvtHFJsQbJcA6PeJxDVeUAVDWkoXPatsraNuTknINF7Mcn+FqRj6ScwfLlEMRVUKrsSDMVFr0inRQzXwYez9vcwSrdUk+fUIj+ZJmK7pytzkne/T94n1a4tNAqsLMle0+ZRC0bbLmosUh7yJqpPG1HuHHpDnXqSyncn1CFtGWFuG2sfdff+/lJ56wR7272LBEcP18YnAwIQntzfwUlDuvJiYh6Y/IZ4zkHjRMnxFAzfFlt7ndvz3Wp12o+1xSHgXajo07o7lxJcuiSig7xNb1u9zzcOXc9llUPPdY51mR91b+dwfWGz19J10Nnyarb/OpjBM5j7cCny5s0q6/DVxxD1ERwP24mM9tIO4tfEZ4k+qzP+4vSxrqGTEhzXicpTTXsJVtuYM/l7WYFYcq2YopV1mWsrsxltIAbX7M7/0twYeqnYYjMi1/3g7UzHxCrObsge9AG77KZ1XNJzCXXks3vjAxvG+G4PeX4FXpEmyXLkG7dAl2Spdgt3QJOqVLsFe6BM302fQyZWhWQIbyPWOzAr6xWQHv2Mz1jy3KuWzDXLUp6us8i0y/5OqoK9Pes9luoxL2i1JUwYYb9F2tashRBVtGOapgzw36lkY15Cg/8mE5yo9/WI7yoyCWoxqxUDXioUZFYqJGReKiRkVio0ZF4qNGQYz0EDlsuJmVMHPmLHFWZUn5tJ+7srQrYmUpVpZiZSlWlpgqVpZiZSlWlmJlKVaWfmxcXP4dQvn3BuVnWcrPr5SfWSk/p1KNbEr5MlQhi1K+Z6xG9qR877hOZakts5F7IG9VKktapvaz2W6sLMXKUqwsxcpSrCw9Vca+Gv60GjFRrCzFytJ9KkvKl/zclSX1rFOsLMXKUqwsxcpSrCzFylKsLMXKUqws/di4uPw7hPLvDcrPspSfXyk/s1J+TqUa2ZTyZahCFqV8z1iN7En53nG9ytIOxRuvnkmm9SpLSqbms9lurCzFylKsLMXKUqwsPVXGvhr+tBoxUawsxcqSX1kyr5feGHxIXlVpICVAfPh8qDBFQnODObcbgtsjD+OwN8wVYc3lzHov4LbsKYy155jfcGnm2XxsQjnIsVXV0HnMEM5swXk6u06QhcVzlbjKwSc6Xgg+uU290TOMdVu9FuYpba1gW5tGczjwKhYhTEL6iyeSTqjuwu8ANSuC2Xjd1yFo3SDYl4nhk5z6VA/DypCpvQ250ll43Zebj/XhCZ3SxBqj3kNZo1Uakv6hnqvdBPVwaPBSdbRr+szneKmV1hA+m+TGmiV1nUAPmHtcpDh9jVe+bOtJ3Nw4iVsbJ/GrjZN425G4RU8BZEut8VmSa4rHSq99o++LwjjTG2kf7Poh0ztraOhUUrdl+ORSjX8vxtKHuGe0+BRmO398YZw9Ptyd5nIHckeocW4bnLWxMN8jnY0322LkoN56nTU6k8Zs+5b2UD7z0txhwxS8L7Nuhd5M7WLdGOfxcUr4/Tn3i1jMN+jGuCXGLdWMW1pU28339+1Cf6/OBI4xTIxhYgwTY5gYw1Qvhtl7QAzzsKyL2tN+bPQSY5efPXb5MTsTZsZ3Y8yyURJvfsxStsQxTolxSplxivuN8ofkWtQJYTHXEuOVKsYrMdcS45aXFrfEXEuMYWIME34DTcy1xNjlpcQuMdcSY5aXErOULXGMU2KcUm6cYn/H6iG5ls6DopWYa4nxSsy1xLglxi0x1xJjmBjDPEUM03xADBNzLTF2qW7sEnMtMWZ5KTFL2RLHOCXGKQ+NU2wIf//5QpgxQwgWimjUOyEe/k0hG/6O9hmNV2+M8Tm4lBzd4B6l3tNrvpdV52eyKBJa1280whXZnE3pv280nxrfOXhDEQNLOJGW4fLJojLbq3cSuhoSpkANGVAfI/HJGKF6x08IZ7cJ0do0vRy+vQy+vSAt68A47cUcnwnXVL0Mqp6ctQnZ2TQw9jDObtMF+KWBbXgtXQpXhz+BJ0SrxRUJR+Q2hdm/P69hnN2mSGafwpW5Vyhzz5HZxuo3ufot9RsmUaI5eZ5uqgHmOH2c289BSvUNtOgPgNxl9BqmNCP3EN6N4cM87Aj6Qu61rmVnUbljCnnxEAW+ExRnkve9pYwFVun4s7CIsz350/n2/CcTo2+Pvj369ujbo2/fRN+eXwmNvj369ujbo2+Pvv15fbsL40zNKXiSa/JjTGNfn4KMd0LnevQ1Xr2heftO7/C7TnsJQVXPIdghzfdYjlhfvQc+MzGXHvcuuBupN2l+pyznUHykvufe7Pr4RNZn9LlMnE/dB54t+N2i74lzBop1GTWhQ3cuW0DRkdWeOznqEDczI4o82tBuC1o206qD4qvOelDczJZoF5hxs7OLLhTpZrD+fG7RKcFxjV0Pm02VxcH1vtlUmJucwagOBVahfqP1++6tRZgmIT9zCb5jRtlFs7r0Ua4e20kWFe7naMMTsj5ciQui7VJOfJDOVDONL9ajz+KrK1oT2slXOTztc5myddy2AG6rMqr42X5XZQiW1ZptTa1XyJawFlSDq0vxl/Qk6iy1JmHmaYul4PNfbArsR50zZY9rTBqSd77UCqTTZ1TtgeW9AkvZpco2r/qtdYbVJ2lvIWvVuOIzpr6lEvIOpObJ1dlsusSasVPHKluy1zwaxOfNG1IcUVV3In0MtjszOGrtukytE7mu2yqhfY8jB32Cy0B+QlmQW4gGNeucrHCRjudfMBbWTheTUDwypRlAyAp0lWVkax5K7V2Q/XHPB2IleVxS7KtmZD3KmuwR4/UbkllruvYdHInrUwN3Ca/WOAv7ieKDRVrXUXVrxrEOrjJp1Iqr8/y4RnRFq25T1h893mbBeHdyx7tTynjrT6grK4o61P3jfrqPuDDXS9pQO06pavSyQ88wb8HfbXkfrqMMPFOKfWVT3h/fJ3pBvh1ojc8Zxfglxi+bE7+0CuMX9RzM08Qv/DTelnwmLz9+8e21SvHL5kYvpk6vG7kMBT4DcQWeYkyfb+DzF/g/rHg8w2/qz9vhO7k7fOcF7vBT+Vn39tgdX8PNLIZ+wob3ghvQSpZJX9foiS7+7N4xzWkOBtDTBfziLuWfTelTJGRnfemjdUbMhSaWzWDfNfAFyB13Qx0bqjnSMhePoFWREbRyRlDPWZ+6k82qe9muDwDp0qdjMTbsynyiOu/MyivZyvUa9UwMzthhmkXE+0fOnttQRcd2YtdMbAxCsmVnvB6lmoEZ2RZDToAWs/dsc3xehqZRM3kghrRLjMRnqk3MqP3/AUeAlD0=

Upvotes: 1

Views: 1195

Answers (2)

myselfhimself
myselfhimself

Reputation: 579

Here is a Python version for Mark Adler's algorithm in C# with the DotNetZip library:

import base64
import zlib

b64decoded = base64.b64decode(strangeString)

zlibdecompressed = zlib.decompress(b64decoded[32:])
print(zlibdecompressed.decode('utf-16le'))

Here is an example XML result for the question's hash:

<?xml version="1.0" encoding="UTF-16" ?><Adobe_Root><Adobe_Title><Version>20080702</Version><Motion_Settings><Play_Forward>true</Play_Forward><Start_on_Screen>false</Start_on_Screen><Pre_Roll>0</Pre_Roll><Ease_In>0</Ease_In><End_off_Screen>false</End_off_Screen><Post_Roll>0</Post_Roll><Ease_Out>0</Ease_Out></Motion_Settings></Adobe_Title><InscriberLayouts Version="1.0"><Layout><LayoutEffectInfo Version="2"><EffectType>0</EffectType><Indic>false</Indic></LayoutEffectInfo><LayoutDimension Version="2"><pXPIXELS>1920</pXPIXELS><pYLINES>1080</pYLINES><pSCREENAR>1</pSCREENAR><growthDirection>growRightDown</growthDirection></LayoutDimension><LayoutAttributes><SafeTitleArea><left>0.1</left><top>0.1</top><right>0.9</right><bottom>0.9</bottom></SafeTitleArea><SafeActionArea><left>0.05</left><top>0.05</top><right>0.95</right><bottom>0.95</bottom></SafeActionArea></LayoutAttributes><Background Version="4"><ShaderReference>4098</ShaderReference><On>false</On><paintingRange>normalLayout</paintingRange></Background><DefaultStyle><Reference>4098</Reference></DefaultStyle><DefaultTextDescription><Reference>4098</Reference></DefaultTextDescription><GraphicObjectDefaults><endCapType>square</endCapType><joinTypeClosed>round</joinTypeClosed><joinTypeOpen>round</joinTypeOpen><lineWidth>5</lineWidth><miterLimit>5</miterLimit><windBeziers>false</windBeziers><roundCornerFillets>37.5 37.5 37.5 37.5 37.5 37.5 37.5 37.5 </roundCornerFillets><clippedCornerFillets>37.5 37.5 37.5 37.5 37.5 37.5 37.5 37.5 </clippedCornerFillets></GraphicObjectDefaults><TextChainDefaults><normal><leading>0</leading><boxCanGrow>false</boxCanGrow><wordWrap>true</wordWrap><lockedLinesX>false</lockedLinesX><lockedLinesY>false</lockedLinesY><Alignment>left</Alignment><tabModeStyle>Word</tabModeStyle><implicitTabSpacing>100</implicitTabSpacing><implicitTabType>left</implicitTabType><tabs></tabs></normal><boxNormal><leading>0</leading><boxCanGrow>false</boxCanGrow><wordWrap>true</wordWrap><lockedLinesX>true</lockedLinesX><lockedLinesY>true</lockedLinesY><Alignment>left</Alignment><tabModeStyle>Word</tabModeStyle><implicitTabSpacing>100</implicitTabSpacing><implicitTabType>left</implicitTabType><tabs></tabs></boxNormal><blockNormal><leading>0</leading><boxCanGrow>false</boxCanGrow><wordWrap>false</wordWrap><lockedLinesX>true</lockedLinesX><lockedLinesY>true</lockedLinesY><Alignment>left</Alignment><tabModeStyle>Word</tabModeStyle><implicitTabSpacing>100</implicitTabSpacing><implicitTabType>left</implicitTabType><tabs></tabs></blockNormal><spline><leading>0</leading><boxCanGrow>false</boxCanGrow><wordWrap>false</wordWrap><lockedLinesX>false</lockedLinesX><lockedLinesY>false</lockedLinesY><Alignment>left</Alignment><tabModeStyle>Word</tabModeStyle><implicitTabSpacing>100</implicitTabSpacing><implicitTabType>left</implicitTabType><tabs></tabs></spline></TextChainDefaults><TextDescriptions Version="4"><TextDescription Reference="4096"><TypeSpec><size>360</size><txHeight>47```

Upvotes: 0

Stephen Ellis
Stephen Ellis

Reputation: 2701

It's a Zlib-compressed string with a 32-byte uncompressed header.

 var encoding = @"";
        byte[] data = Convert.FromBase64String(encoding);
        var compressedArray = new byte[data.Length - 32];
        Array.Copy(data, 32, compressedArray, 0, data.Length - 32);
        var decompressed = ZlibStream.UncompressBuffer(compressedArray);
        var str = Encoding.Unicode.GetString(decompressed);

The header contains the uncompressed length of the data in little-endian order at offset 8: a5 d5 or 0xd5a4, which equals 54692. It is hard to tell from this example if the uncompressed length is stored as a5 d5, a5 d5 00 00, or a5 d5 00 00 00 00 00 00.

Upvotes: 3

Related Questions