Reputation: 65
When parsing TCP header, there's a field named data offset with length of 4 bits. When parsing the header, fields need to be reversed to host oder. Here comes the question: when reversing these fields that are not 16- or 32-bits long which means I can't use ntohs
and ntohl
, do I reverse them field-wise or byte-wise, or in another way?
Let's suppose one byte contains two fields, f1
and f2
of size 4 bits each. The data is 1000 0100
. For the field-wise reversal, the result should be 0001 0010
. For the byte-wise reversal, the result is 0010 0001
. Which one is correct?
Update:
Here is the struct
I'm using to parse the header:
#pragma pack(push, 1)
struct tcp_hdr_t {
uint16_t src_port;
uint16_t dst_port;
uint32_t seq;
uint32_t ack;
uint8_t data_offset : 4;
uint8_t f_reserved : 3;
uint8_t f_ns : 1;
uint8_t f_cwr : 1;
uint8_t f_ece : 1;
uint8_t f_urg : 1;
uint8_t f_ack : 1;
uint8_t f_psh : 1;
uint8_t f_rst : 1;
uint8_t f_syn : 1;
uint8_t f_fin : 1;
uint16_t window_size;
uint16_t checksum;
uint16_t urgent_p;
};
#pragma pack(pop)
If I don't reverse field of data offset and flags, the result is wrong compared with it from Wireshark.
As you can see, the raw data is 0xa002
, but the result seems like 0xa
for data offset doesn't need to be reversed, but the part of flags seems reversed.
Upvotes: 1
Views: 1027
Reputation: 224882
The problem you're seeing has to do with how bit fields are implemented.
From section 6.7.2.1p11 of the C standard regarding struct an union specifiers:
An implementation may allocate any addressable storage unit large enough to hold a bit- field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
What this means is that you can't portably depend on the ordering of bit fields. As an example of this, the file /usr/include/netinet/tcp.h on Linux contains the following:
struct tcphdr
{
__extension__ union
{
struct
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
u_int8_t th_flags;
# define TH_FIN 0x01
# define TH_SYN 0x02
# define TH_RST 0x04
# define TH_PUSH 0x08
# define TH_ACK 0x10
# define TH_URG 0x20
u_int16_t th_win; /* window */
u_int16_t th_sum; /* checksum */
u_int16_t th_urp; /* urgent pointer */
};
struct
{
u_int16_t source;
u_int16_t dest;
u_int32_t seq;
u_int32_t ack_seq;
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int16_t res1:4;
u_int16_t doff:4;
u_int16_t fin:1;
u_int16_t syn:1;
u_int16_t rst:1;
u_int16_t psh:1;
u_int16_t ack:1;
u_int16_t urg:1;
u_int16_t res2:2;
# elif __BYTE_ORDER == __BIG_ENDIAN
u_int16_t doff:4;
u_int16_t res1:4;
u_int16_t res2:2;
u_int16_t urg:1;
u_int16_t ack:1;
u_int16_t psh:1;
u_int16_t rst:1;
u_int16_t syn:1;
u_int16_t fin:1;
# else
# error "Adjust your <bits/endian.h> defines"
# endif
u_int16_t window;
u_int16_t check;
u_int16_t urg_ptr;
};
};
};
You can see here the hoops that need to be jumped through to get things in the right place. Other implementations might do it differently.
The best way to handle this in your code is to get rid of the bitfields and replace them with a pair of uint8_t
members and use bitmasks to extract the necessary subfields.
For example:
struct tcp_hdr_t {
uint16_t src_port;
uint16_t dst_port;
uint32_t seq;
uint32_t ack;
uint8_t offset_flags1;
uint8_t flags2;
uint16_t window_size;
uint16_t checksum;
uint16_t urgent_p;
};
#define DATA_OFFSET(hdr) (((hdr).offset_flags1 & 0x0f) >> 4)
#define FLAG_NONCE(hdr) (((hdr).offset_flags1 & 0x01) >> 0)
#define FLAG_CWR(hdr) (((hdr).flags2 & 0x80) >> 7)
#define FLAG_ECE(hdr) (((hdr).flags2 & 0x40) >> 6)
#define FLAG_URG(hdr) (((hdr).flags2 & 0x20) >> 5)
#define FLAG_ACK(hdr) (((hdr).flags2 & 0x10) >> 4)
#define FLAG_PSH(hdr) (((hdr).flags2 & 0x08) >> 3)
#define FLAG_RST(hdr) (((hdr).flags2 & 0x04) >> 2)
#define FLAG_SYN(hdr) (((hdr).flags2 & 0x02) >> 1)
#define FLAG_FIN(hdr) (((hdr).flags2 & 0x01) >> 0)
Upvotes: 1
Reputation: 6779
You get you answer the time you say host byte oder.
As in your question, if you have 1000 0100
,
0001 0010
is bit-wise reversal in a single nibble and has got nothing to do with all this.
0010 0001
is nibble-wise reversal and that too has got nothing to do with all this.
1 nibble = 4 bits
1 byte = 8 bits
From the popular Beej's Guide:
Just to make you really unhappy, different computers use different byte orderings internally for their multibyte integers (i.e. any integer that's larger than a char.) The upshot of this is that if you send() a two-byte short int from an Intel box to a Mac (before they became Intel boxes, too, I mean), what one computer thinks is the number 1, the other will think is the number 256, and vice-versa.
The way to get around this problem is for everyone to put aside their differences and agree that Motorola and IBM had it right, and Intel did it the weird way, and so we all convert our byte orderings to "big-endian" before sending them out. Since Intel is a "little-endian" machine, it's far more politically correct to call our preferred byte ordering "Network Byte Order". So these functions convert from your native byte order to network byte order and back again.
If you are dealing with just 1 byte
, you don't even need to bother doing anything.
Upvotes: 3