ρss
ρss

Reputation: 5315

Regarding struct.unpack() in python

Level: Beginner. I am currently working on sniffers with python using raw sockets. I have a general question regarding the format specifiers to be used in unpack() provided by struct module. As we use this method to unpack the data according to the format specifiers specified. I have seen a lot of sniffer programmes using unpack() to decode the packet information from hexadecimal form. for an example in order to extract the ethernet header information following code can be used:

ethHeader = struct.unpack("!6s6s2s", ethernetHeader)

Here the ethernetHeader is variable that contains the actual ethernet header data captured earlier from a raw socket. Now my questions is how can one know which format specifier to use for a header? How can I know in advance that the ethernet addresses are in string format or in some other format? Is there any documentation for this too. I read python docs related to unpack() but didn't find any info. Similarly in the case of IP addresses the code is something like this:

ipAddresses = struct.unpack("!12s4s4s", IPAddresses)

Here the IPAddresses is variable that contains the actual IP addresses information captured earlier from a raw socket. Once again how can I know that I have to use strings as format specifiers (!12s4s4s). Thanks.

Upvotes: 2

Views: 10294

Answers (2)

ρss
ρss

Reputation: 5315

Thanks to J.F. Sebastian for a hint. I finally figured it out and will take some time to explain it here. Normally we have to look for the C type in the struct of every headers to know what C types are being used for each of fields in different headers of a packet. Then later we can use this table to know that which format specifier will represent which C type. For example in case of IP header the struct is as given below:

struct ipheader {
 unsigned char ip_hl:4, ip_v:4; /* this means that each member is 4 bits */
 unsigned char ip_tos;
 unsigned short int ip_len;
 unsigned short int ip_id;
 unsigned short int ip_off;
 unsigned char ip_ttl;
 unsigned char ip_p;
 unsigned short int ip_sum;
 unsigned int ip_src;
 unsigned int ip_dst;
}; 

For an eg: unsigned char are represented as 'B' and unsigned int is represented by 'I'. Now we can use this method to know what format specifiers should be used in struct.unpack() to get the field values of a IP header. In case of a IP header it becomes as following:

struct.unpack('!BBHHHBBHII')

But you shall notice that most of the programme uses struct.unpack('!BBHHHBBH4s4s').

So the question arises why in case of unsigned int ip_src; & unsigned int ip_dst; 's' is used instead of 'I' as a format specifier in struct.unpack(). The reason is the if 'I' is used as a format specifier then the unpack() method returns the IP addresses in form of an integer form (eg: 3232267778). Then you have to covert it to actual IP address form (eg: 10.0.0.1). Usually in the sniffer programmes that are available on internet simply use socket.inet_ntoa() for obtaining the actual ip addresses. This method accept a string type and not an integer type. So that is the reason why in case of unsigned int ip_src; & unsigned int ip_dst; 's' is used instead of 'I' as a format specifier in struct.unpack() so that the result can be later fed to socket.inet_ntoa() to obtain the IP address in actual IP address format. Similarly in the case for ethernet header. We use 's' instead of 'B' in struct.unpack() because we need a string that can be later fed to binascii.hexlify() in order to get the MAC in actual MAC address format.

Upvotes: 5

jfs
jfs

Reputation: 414835

struct.unpack allows you to convert a sequence of bytes that contains C types specified in the format (the first argument) into corresponding Python objects (integer, float, string).

It is generic.

how can one know which format specifier to use for a header? How can I know in advance that the ethernet addresses are in string format or in some other format? Is there any documentation for this too. I read python docs related to unpack() but didn't find any info.

struct module knows nothing about formats your application might need. It is specific to your application i.e., in this case it is about TCP/IP suite, protocols, sniffers and networking. Read about it to figure out what C types to expect in ethernetHeader, IPAddresses, etc and then create appropriate format string using this table.

Upvotes: 1

Related Questions