Reputation: 16764
I was confronted with a tricky (IMO) question. I needed to compare two MAC addresses, in the most efficient manner.
The only thought that crossed my mind in that moment was the trivial solution - a for
loop, and comparing locations, and so I did, but the interviewer was aiming to casting.
The MAC definition:
typedef struct macA {
char data[6];
} MAC;
And the function is (the one I was asked to implement):
int isEqual(MAC* addr1, MAC* addr2)
{
int i;
for(i = 0; i<6; i++)
{
if(addr1->data[i] != addr2->data[i])
return 0;
}
return 1;
}
But as mentioned, he was aiming for casting.
Meaning, to somehow cast the MAC address given to an int, compare both of the addresses, and return.
But when casting, int int_addr1 = (int)addr1;
, only four bytes will be casted, right? Should I check the remaining ones? Meaning locations 4 and 5?
Both char
and int
are integer types so casting is legal, but what happens
in the described situation?
Upvotes: 60
Views: 7507
Reputation: 10333
By the way, for those truly looking for a performant answer, the following is branchless, and while it does more fetches (one per char), they should all be from the same cache line, so not very expensive.
int isEqual(MAC* addr1, MAC* addr2)
{
return
(addr1->data[0] == addr2->data[0])
& (addr1->data[1] == addr2->data[1])
& (addr1->data[2] == addr2->data[2])
& (addr1->data[3] == addr2->data[3])
& (addr1->data[4] == addr2->data[4])
& (addr1->data[5] == addr2->data[5])
;
}
See it live (and branchless) here
Upvotes: 0
Reputation: 2306
There is nothing wrong with an efficient implementation, for all you know this has been determined to be hot code that is called many many times. And in any case, its okay for interview questions to have odd constraints.
Logical AND is a priori a branching instruction due to short-circuit evaluation even if it doesn't compile this way, so lets avoid it, we don't need it. Nor do we need to convert our return value to a true bool (true or false, not 0 or anything that's not zero).
Here is a fast solution on 32-bit: XOR will capture the differences, OR will record difference in both parts, and NOT will negate the condition into EQUALS, not UNEQUAL. The LHS and RHS are independent computations, so a superscalar processor can do this in parallel.
int isEqual(MAC* addr1, MAC* addr2)
{
return ~((*(int*)addr2 ^ *(int*)addr1) | (int)(((short*)addr2)[2] ^ ((short*)addr1)[2]));
}
EDIT
The purpose of the above code was to show that this could be done efficiently without branching. Comments have pointed out this C++ classifies this as undefined behavior. While true, VS handles this fine. Without changing the interviewer's struct definition and function signature, in order to avoid undefined behavior an extra copy must be made. So the non-undefined behavior way without branching but with an extra copy would be as follows:
int isEqual(MAC* addr1, MAC* addr2)
{
struct IntShort
{
int i;
short s;
};
union MACU
{
MAC addr;
IntShort is;
};
MACU u1;
MACU u2;
u1.addr = *addr1; // extra copy
u2.addr = *addr2; // extra copy
return ~((u1.is.i ^ u2.is.i) | (int)(u1.is.s ^ u2.is.s)); // still no branching
}
Upvotes: 14
Reputation: 5290
To do type punning correctly you have to use an union. Otherwise you will break the rules strict aliasing which certain compilers follow, and the result will be undefined.
int EqualMac( MAC* a , MAC* b )
{
union
{
MAC m ;
uint16_t w[3] ;
} ua , ub ;
ua.m = *a ;
ub.m = *b ;
if( ua.w[0] != ub.w[0] )
return 0 ;
if( ua.w[1] != ub.w[1] )
return 0 ;
if( ua.w[2] != ub.w[2] )
return 0 ;
return 1 ;
}
According to C99 it is safe to read from an union member that is not the last used to store a value in it.
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
Upvotes: 1
Reputation: 477040
If your interviewer demands that you produce undefined behavior, I would probably look for a job elsewhere.
The correct initial approach would be to store the MAC address in something like a uint64_t
, at least in-memory. Then comparisons would be trivial, and implementable efficiently.
Upvotes: 46
Reputation: 1863
May be he had in mind a definition of MAC that used unsigned char and was thinking to:
int isEqual(MAC* addr1, MAC* addr2) { return strncmp((*addr1).data,(*addr2).data,6)==0; }
which implies a cast from (unsigned char *) to (char *). Anyway bad question.
Upvotes: 0
Reputation: 153458
Non-portable casting solution.
In a platform I use (PIC24 based), there is a type int48
, so making a safe assumption char
is 8 bits and the usual alignment requirements:
int isEqual(MAC* addr1, MAC* addr2) {
return *((int48_t*) &addr1->data) == *((int48_t*) &addr2->data);
}
Of course, this is not usable on many platforms, but then so are a number of solutions that are not portable either, depending on assumed int
size, no padding
, etc.
The highest portable solution (and reasonably fast given a good compiler) is the memcmp()
offered by @H2CO3.
Going to a higher design level and using a wide enough integer type like uint64_t
instead of struct macA
, as suggested by Kerrek SB, is very appealing.
Upvotes: 3
Reputation: 30136
Function memcmp will eventually do the loop itself. So by using it, you would basically just make things less efficient (due to the additional function-call).
Here is an optional solution:
typedef struct
{
int x;
short y;
}
MacAddr;
int isEqual(MAC* addr1, MAC* addr2)
{
return *(MacAddr*)addr1 == *(MacAddr*)addr2;
}
The compiler will most likely convert this code into two comparisons, since the MacAddr structure contains two fields.
Cavity: unless your CPU supports unaligned load/store operations, addr1 and addr2 must be aligned to 4 bytes (i.e., they must be located in addresses that are divisible by 4). Otherwise, a memory access violation will most likely occur when the function is executed.
You may divide the structure into 3 fields of 2 bytes each, or 6 fields of 1 byte each (reducing the alignment restriction to 2 or 1 respectively). But bare in mind that a single comparison in your source code is not necessarily a single comparison in the executable image (i.e., during runtime).
BTW, unaligned load/store operations by themselves may add runtime latency, if they require more "nops" in the CPU pipeline. This is really a matter of CPU architecture, which I doubt they meant to "dig into" that far in your job interview. However, in order to assert that the compiled code does not contain such operations (if indeed they are "expensive"), you could ensure that the variables are always aligned to 8 bytes AND add a #pragma (compiler directive) telling the compiler "not to worry about this".
Upvotes: 0
Reputation: 4444
You have a MAC structure (which contains an array of 6 bytes),
typedef struct {
char data[6];
} MAC;
Which agrees with this article about typedef for fixed length byte array.
The naive approach would be to assume the MAC address is word aligned (which is probably what the interviewer wanted), albeit not guaranteed.
typedef unsigned long u32;
typedef signed long s32;
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u32 m1 = *(u32*)mac1->data;
U32 m2 = *(u32*)mac2->data;
if( m1 != m2 ) return (s32)m1 - (s32)m2;
u16 m3 = *(u16*)(mac1->data+4);
u16 m2 = *(u16*)(mac2->data+4);
return (s16)m3 - (s16)m4;
}
Slightly safer would be to interpret the char[6] as a short[3] (MAC more likely to be aligned on even byte boundaries than odd),
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u16* p1 = (u16*)mac1->data;
u16* p2 = (u16*)mac2->data;
for( n=0; n<3; ++n ) {
if( *p1 != *p2 ) return (s16)*p1 - (s16)*p2;
}
return(0);
}
Assume nothing, and copy to word aligned storage, but the only reason for typecasting here is to satisfy the interviewer,
typedef unsigned short u16;
typedef signed short s16;
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1); //check for NULL args
u16 m1[3]; u16 p2[3];
memcpy(m1,mac1->data,6);
memcpy(m2,mac2->data,6);
for( n=0; n<3; ++n ) {
if( m1[n] != m2[n] ) return (s16)m1[n] - (s16)m2[n];
}
return(0);
}
Save yourself lots of work,
int
MACcmp(MAC* mac1, MAC* mac2)
{
if(!mac1 || !mac2) return(-1);
return memcmp(mac1->data,mac2->data,6);
}
Upvotes: 0
Reputation: 10333
Cowboy time:
typedef struct macA {
char data[6];
} MAC;
typedef struct sometimes_works {
long some;
short more;
} cast_it
typedef union cowboy
{
MAC real;
cast_it hack;
} cowboy_cast;
int isEqual(MAC* addr1, MAC* addr2)
{
assert(sizeof(MAC) == sizeof(cowboy_cast)); // Can't be bigger
assert(sizeof(MAC) == sizeof(cast_it)); // Can't be smaller
if ( ( ((cowboy_cast *)addr1)->hack.some == ((cowboy_cast *)addr2)->hack.some )
&& ( ((cowboy_cast *)addr1)->hack.more == ((cowboy_cast *)addr2)->hack.more ) )
return (0 == 0);
return (0 == 42);
}
Upvotes: 16
Reputation: 2988
This would work on most systems,and be faster than your solution.
int isEqual(MAC* addr1, MAC* addr2)
{
return ((int32*)addr1)[0] == ((int32*)addr2)[0] && ((int16*)addr1)[2] == ((int16*)addr2)[2];
}
would inline nicely too, could be handy at the center of loop on a system where you can check the details are viable.
Upvotes: 4
Reputation:
If he is really dissatisfied with this approach (which is essentially a brain fart, since you aren't comparing megabytes or gigabytes of data, so one shan't really be worrying about "efficiency" and "speed" in this case), just tell him that you trust the quality and speed of the standard library:
int isEqual(MAC* addr1, MAC* addr2)
{
return memcmp(&addr1->data, &addr2->data, sizeof(addr1->data)) == 0;
}
Upvotes: 114