Bob
Bob

Reputation: 23000

How can I read the manifest of an Android apk file using C# .Net?

I have a .net webservice that should be aware of the version of an apk file that is in one of its folders. I know apktool but I need to read the manifest of the apk by my .net webserice.

How can I do that? do you know any sample code in c# for decoding and reading the manifest of the apk file?

Unziping apk and deciding the manifest file is my main problem

Upvotes: 7

Views: 12118

Answers (3)

hylander0
hylander0

Reputation: 1100

I have created a .NET library that parses/reads the androidManifest.xml file data and cross references the Resources.arsc for reference Id values (so you don't have to).

Parse APK Manifest with C# .NET with Iteedee.ApkReader

The primary goal of this library is to parse Android Application package meta-data. There are many scenarios where you might need to glean information about your android package, specifically in continuous integration setups.

After decompressing the file, all that is required is passing the needed byte array objects into the library's method:

ApkReader apkReader = new ApkReader();
ApkInfo info = apkReader.extractInfo(manifestData, resourcesData);

The APKInfo object supplies:

  • Package Name
  • Version Name
  • Version Code
  • App Has Icon Check
  • All (dpi types) of the App Icons Found
  • Permissions
  • Screen Density Support Configuration

Upvotes: 4

Alberto Chiesa
Alberto Chiesa

Reputation: 7350

This answer is not going to add any particular feature to the existing ones.

But I was unsatisfied by the readability and idiomaticity of the ported Java code.

So I started from the code published by @Bobs and refactored it until I was satisfied from the result. But, aside from my OCD, it is basically the same thing:

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml.Linq;

// ReSharper disable CommentTypo

namespace Utils
{
    /// <summary> Unpacks and reads data from a compressed apk manifest file. </summary>
    /// <remarks>
    /// Code coming from:
    /// https://stackoverflow.com/questions/18997163/how-can-i-read-the-manifest-of-an-android-apk-file-using-c-sharp-net/22314629
    /// </remarks>
    public class AndroidManifestReader
    {
        // Parses the 'compressed' binary form of Android XML docs 
        // such as for AndroidManifest.binaryXml in .apk files
        private const int END_DOC_TAG = 0x00100101;
        private const int START_TAG = 0x00100102;
        private const int END_TAG = 0x00100103;

        // StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
        // of the length/string data in the StringTable.
        private const int STRING_INDEX_TABLE_OFFSET = 0x24;

        private readonly byte[] _xml;

        private XDocument _manifest;

        /// <summary>Reads and returns the uncompressed Xml Manifest</summary>
        public XDocument Manifest => _manifest ?? (_manifest = ReadManifest());

        public AndroidManifestReader(byte[] xml)
        {
            _xml = xml ?? throw new ArgumentNullException(nameof(xml));
        }

        private XDocument ReadManifest()
        {
            var result = new XDocument();
            result.Add(new XElement("root"));

            var tagStack = new Stack<XElement>();
            tagStack.Push(result.Root);

            var tagOffset = FindStartOfTags();
            while (tagOffset < _xml.Length)
            {
                var tagCode = BitConverter.ToInt32(_xml, tagOffset);
                switch (tagCode)
                {
                    case START_TAG:
                        tagOffset += ReadStartTag(tagOffset, out var startTag);
                        tagStack.Peek().Add(startTag);
                        tagStack.Push(startTag);
                        break;
                    case END_TAG:
                        var expectedTagName = tagStack.Peek().Name.LocalName;
                        tagOffset += ReadEndTag(tagOffset, expectedTagName);
                        tagStack.Pop();
                        break;
                    case END_DOC_TAG:
                        goto manifest_read;
                    default:
                        goto manifest_read;
                }
            }

            manifest_read:
            return result;
        }

        /// <summary>Reads a start tag and returns the number of consumed bytes.</summary>
        /// <param name="offset"></param>
        /// <param name="element"></param>
        /// <remarks>
        /// XML tags and attributes:
        /// Every XML start and end tag consists of 6 32 bit words:
        ///   0th word: 02011000 for startTag, 03011000 for endTag, 01011000 for end of document
        ///   1st word: a flag?, like 38000000
        ///   2nd word: Line of where this tag appeared in the original source file
        ///   3rd word: FFFFFFFF ??
        ///   4th word: StringIndex of NameSpace name, or FFFFFFFF for default NS
        ///   5th word: StringIndex of Element Name
        /// 
        /// Start tags (not end tags) contain 3 more words:
        ///   6th word: 14001400 meaning?? 
        ///   7th word: Number of Attributes that follow this tag(follow word 8th)
        ///   8th word: 00000000 meaning??
        /// </remarks>
        /// <returns>The number of bytes consumed by the tag.</returns>
        private int ReadStartTag(int offset, out XElement element)
        {
            const int startTagDataSize = 9 * 4;
            element = new XElement(ReadTagName(offset));
            var bytesConsumed = startTagDataSize;

            var attributesCount = BitConverter.ToInt32(_xml, offset + 7 * 4);

            for (var attrIdx = 0; attrIdx < attributesCount; attrIdx++)
            {
                bytesConsumed += ReadAttribute(offset + bytesConsumed, out var attr);
                element.Add(attr);
            }

            return bytesConsumed;
        }

        /// <summary>Reads an attribute starting at the specified offset,
        /// and returns the consumed bytes count.</summary>
        /// <remarks>
        /// Attributes consist of 5 words: 
        ///   0th word: StringIndex of Attribute Name's Namespace, or FFFFFFFF
        ///   1st word: StringIndex of Attribute Name
        ///   2nd word: StringIndex of Attribute Value, or FFFFFFF if ResourceId used
        ///   3rd word: Flags?
        ///   4th word: str ind of attr value again, or ResourceId of value
        /// </remarks>
        private int ReadAttribute(int offset, out XAttribute attribute)
        {
            var attributeNameStringIndex = BitConverter.ToInt32(_xml, offset + 1 * 4);
            var attrName = RetrieveFromStringTable(attributeNameStringIndex);

            var attributeValueStringIndex = BitConverter.ToInt32(_xml, offset + 2 * 4);
            // AttrValue ResourceId or dup AttrValue StrInd
            var attributeResourceId = BitConverter.ToInt32(_xml, offset + 4 * 4);

            var attrValue = attributeValueStringIndex >= 0
                ? RetrieveFromStringTable(attributeValueStringIndex)
                : attributeResourceId.ToString();

            attribute = new XAttribute(attrName, attrValue);
            return 20;
        }

        private int ReadEndTag(int tagOffset, string expectedTagName)
        {
            var tagName = ReadTagName(tagOffset);
            // Skip over 6 words of endTag data
            return tagName == expectedTagName
                ? 6 * 4
                : throw new InvalidOperationException(
                    $"Malformed XML: expecting {expectedTagName} but found {tagName}");
        }

        private string ReadTagName(int tagOffset)
        {
            var nameIndexOffset = tagOffset + 5 * 4;
            var nameStringIndex = BitConverter.ToInt32(_xml, nameIndexOffset);
            return RetrieveFromStringTable(nameStringIndex);
        }

        private int GetStringTableOffset()
        {
            // Compressed XML file/bytes starts with 24x bytes of data,
            // 9 32 bit words in little endian order (LSB first):
            //   0th word is 03 00 08 00
            //   3rd word SEEMS TO BE:  Offset at then of StringTable
            //   4th word is: Number of strings in string table
            var stringsCount = BitConverter.ToInt32(_xml, 4 * 4);
            return STRING_INDEX_TABLE_OFFSET + stringsCount * 4;
        }

        /// <summary> Return the string stored in StringTable format at
        /// offset strOff.  This offset points to the 16 bit string length, which 
        /// is followed by that number of 16 bit (Unicode) chars. </summary>
        /// <returns></returns>
        private string RetrieveFromStringTable(int strInd)
        {
            if (strInd < 0) return null;

            // StringTable, each string is represented with a 16 bit little endian 
            // character count, followed by that number of 16 bit (LE) (Unicode) chars.
            // StringTable follows StrIndexTable
            var stringOffset = GetStringTableOffset() +
                               BitConverter.ToInt32(_xml, STRING_INDEX_TABLE_OFFSET + strInd * 4);
            var stringLength = BitConverter.ToInt16(_xml, stringOffset);
            return Encoding.Unicode.GetString(_xml, stringOffset + 2, stringLength * 2);
        }

        /// <summary> The XML tag tree starts after some unknown content after the StringTable.
        /// There is some unknown data after the StringTable, scan forward from this point
        /// to the flag for the start of an XML start tag. </summary>
        /// <returns></returns>
        internal int FindStartOfTags()
        {
            // 12 is the index of the word containing the base offset of the xml tree.
            var xmlTagOffset = BitConverter.ToInt32(_xml, 12);

            for (var offset = xmlTagOffset; offset < _xml.Length - 4; offset += 4)
            {
                if (BitConverter.ToInt32(_xml, offset) == START_TAG) return offset;
            }

            return xmlTagOffset;
        }
    }
}
var manifest = new AndroidManifestReader(bytes).Manifest;

Upvotes: 0

Bob
Bob

Reputation: 23000

I used SharpZipLib and this answer and created a .Net version.

string apkPath = "C:\\app.apk";
ICSharpCode.SharpZipLib.Zip.ZipInputStream zip = new ICSharpCode.SharpZipLib.Zip.ZipInputStream(File.OpenRead(apkPath));
var filestream = new FileStream(apkPath, FileMode.Open, FileAccess.Read);
ICSharpCode.SharpZipLib.Zip.ZipFile zipfile = new ICSharpCode.SharpZipLib.Zip.ZipFile(filestream);
ICSharpCode.SharpZipLib.Zip.ZipEntry item;


 while ((item = zip.GetNextEntry()) != null)
 {
    if (item.Name == "AndroidManifest.xml")
    {
        byte[] bytes = new byte[50 * 1024];

        Stream strm = zipfile.GetInputStream(item);
        int size = strm.Read(bytes, 0, bytes.Length);

        using (BinaryReader s = new BinaryReader(strm))
        {
            byte[] bytes2 = new byte[size];
            Array.Copy(bytes, bytes2, size);
            AndroidDecompress decompress = new AndroidDecompress();
            content = decompress.decompressXML(bytes);
        }
     }
  }

and

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

/// <summary>
/// Summary description for AndroidDecompress
/// </summary>
public class AndroidDecompress
{
    private string result = "";
    // decompressXML -- Parse the 'compressed' binary form of Android XML docs 
    // such as for AndroidManifest.xml in .apk files
    public static int endDocTag = 0x00100101;
    public static int startTag = 0x00100102;
    public static int endTag = 0x00100103;
    public string decompressXML(byte[] xml)
    {
        // Compressed XML file/bytes starts with 24x bytes of data,
        // 9 32 bit words in little endian order (LSB first):
        //   0th word is 03 00 08 00
        //   3rd word SEEMS TO BE:  Offset at then of StringTable
        //   4th word is: Number of strings in string table
        // WARNING: Sometime I indiscriminently display or refer to word in 
        //   little endian storage format, or in integer format (ie MSB first).
        int numbStrings = LEW(xml, 4 * 4);

        // StringIndexTable starts at offset 24x, an array of 32 bit LE offsets
        // of the length/string data in the StringTable.
        int sitOff = 0x24;  // Offset of start of StringIndexTable

        // StringTable, each string is represented with a 16 bit little endian 
        // character count, followed by that number of 16 bit (LE) (Unicode) chars.
        int stOff = sitOff + numbStrings * 4;  // StringTable follows StrIndexTable

        // XMLTags, The XML tag tree starts after some unknown content after the
        // StringTable.  There is some unknown data after the StringTable, scan
        // forward from this point to the flag for the start of an XML start tag.
        int xmlTagOff = LEW(xml, 3 * 4);  // Start from the offset in the 3rd word.
        // Scan forward until we find the bytes: 0x02011000(x00100102 in normal int)
        for (int ii = xmlTagOff; ii < xml.Length - 4; ii += 4)
        {
            if (LEW(xml, ii) == startTag)
            {
                xmlTagOff = ii; break;
            }
        } // end of hack, scanning for start of first start tag

        // XML tags and attributes:
        // Every XML start and end tag consists of 6 32 bit words:
        //   0th word: 02011000 for startTag and 03011000 for endTag 
        //   1st word: a flag?, like 38000000
        //   2nd word: Line of where this tag appeared in the original source file
        //   3rd word: FFFFFFFF ??
        //   4th word: StringIndex of NameSpace name, or FFFFFFFF for default NS
        //   5th word: StringIndex of Element Name
        //   (Note: 01011000 in 0th word means end of XML document, endDocTag)

        // Start tags (not end tags) contain 3 more words:
        //   6th word: 14001400 meaning?? 
        //   7th word: Number of Attributes that follow this tag(follow word 8th)
        //   8th word: 00000000 meaning??

        // Attributes consist of 5 words: 
        //   0th word: StringIndex of Attribute Name's Namespace, or FFFFFFFF
        //   1st word: StringIndex of Attribute Name
        //   2nd word: StringIndex of Attribute Value, or FFFFFFF if ResourceId used
        //   3rd word: Flags?
        //   4th word: str ind of attr value again, or ResourceId of value

        // TMP, dump string table to tr for debugging
        //tr.addSelect("strings", null);
        //for (int ii=0; ii<numbStrings; ii++) {
        //  // Length of string starts at StringTable plus offset in StrIndTable
        //  String str = compXmlString(xml, sitOff, stOff, ii);
        //  tr.add(String.valueOf(ii), str);
        //}
        //tr.parent();

        // Step through the XML tree element tags and attributes
        int off = xmlTagOff;
        int indent = 0;
        int startTagLineNo = -2;
        while (off < xml.Length)
        {
            int tag0 = LEW(xml, off);
            //int tag1 = LEW(xml, off+1*4);
            int lineNo = LEW(xml, off + 2 * 4);
            //int tag3 = LEW(xml, off+3*4);
            int nameNsSi = LEW(xml, off + 4 * 4);
            int nameSi = LEW(xml, off + 5 * 4);

            if (tag0 == startTag)
            { // XML START TAG
                int tag6 = LEW(xml, off + 6 * 4);  // Expected to be 14001400
                int numbAttrs = LEW(xml, off + 7 * 4);  // Number of Attributes to follow
                //int tag8 = LEW(xml, off+8*4);  // Expected to be 00000000
                off += 9 * 4;  // Skip over 6+3 words of startTag data
                String name = compXmlString(xml, sitOff, stOff, nameSi);
                //tr.addSelect(name, null);
                startTagLineNo = lineNo;

                // Look for the Attributes

                string sb = "";
                for (int ii = 0; ii < numbAttrs; ii++)
                {
                    int attrNameNsSi = LEW(xml, off);  // AttrName Namespace Str Ind, or FFFFFFFF
                    int attrNameSi = LEW(xml, off + 1 * 4);  // AttrName String Index
                    int attrValueSi = LEW(xml, off + 2 * 4); // AttrValue Str Ind, or FFFFFFFF
                    int attrFlags = LEW(xml, off + 3 * 4);
                    int attrResId = LEW(xml, off + 4 * 4);  // AttrValue ResourceId or dup AttrValue StrInd
                    off += 5 * 4;  // Skip over the 5 words of an attribute

                    String attrName = compXmlString(xml, sitOff, stOff, attrNameSi);
                    String attrValue = attrValueSi != -1
                      ? compXmlString(xml, sitOff, stOff, attrValueSi)
                      : /*"resourceID 0x" + */attrResId.ToString();
                    sb += " " + attrName + "=\"" + attrValue + "\"";
                    //tr.add(attrName, attrValue);
                }
                prtIndent(indent, "<" + name + sb + ">");
                indent++;

            }
            else if (tag0 == endTag)
            { // XML END TAG
                indent--;
                off += 6 * 4;  // Skip over 6 words of endTag data
                String name = compXmlString(xml, sitOff, stOff, nameSi);
                prtIndent(indent, "</" + name + ">  \r\n"/*+"(line " + startTagLineNo + "-" + lineNo + ")"*/);
                //tr.parent();  // Step back up the NobTree

            }
            else if (tag0 == endDocTag)
            {  // END OF XML DOC TAG
                break;

            }
            else
            {
                prt("  Unrecognized tag code '" + tag0.ToString("X")
                  + "' at offset " + off);
                break;
            }
        } // end of while loop scanning tags and attributes of XML tree
        //prt("    end at offset " + off);


        return result;
    } // end of decompressXML


    public String compXmlString(byte[] xml, int sitOff, int stOff, int strInd)
    {
        if (strInd < 0) return null;
        int strOff = stOff + LEW(xml, sitOff + strInd * 4);
        return compXmlStringAt(xml, strOff);
    }


    public static String spaces = "                                             ";
    public void prtIndent(int indent, String str)
    {
        prt(spaces.Substring(0, Math.Min(indent * 2, spaces.Length)) + str);
    }

    private void prt(string p)
    {
        result += p;
    }


    // compXmlStringAt -- Return the string stored in StringTable format at
    // offset strOff.  This offset points to the 16 bit string length, which 
    // is followed by that number of 16 bit (Unicode) chars.
    public String compXmlStringAt(byte[] arr, int strOff)
    {
        int strLen = arr[strOff + 1] << 8 & 0xff00 | arr[strOff] & 0xff;
        byte[] chars = new byte[strLen];
        for (int ii = 0; ii < strLen; ii++)
        {
            chars[ii] = arr[strOff + 2 + ii * 2];
        }


        return System.Text.Encoding.UTF8.GetString(chars);  // Hack, just use 8 byte chars
    } // end of compXmlStringAt


    // LEW -- Return value of a Little Endian 32 bit word from the byte array
    //   at offset off.
    public int LEW(byte[] arr, int off)
    {
        return (int)(arr[off + 3] << 24 & 0xff000000 | arr[off + 2] << 16 & 0xff0000 | arr[off + 1] << 8 & 0xff00 | arr[off] & 0xFF);
    } // end of LEW

}

Upvotes: 15

Related Questions