Reputation: 326
I've been implementing the same code (the number of ways to deal a hand in Blackjack without busting) in a variety of languages and implementations. One oddity I've noticed is that the implementation of Python calling the partitions function in C is actually slightly faster than the entire program written in C. The same appears to be true for other languages (Ada vs Python calling Ada, Nim vs Python calling Nim). This seems counterintuitive to me - any idea how this is possible?
The code is all in my GitHub repo here:
https://github.com/octonion/puzzles/tree/master/blackjack
Here's the C code, compiled using 'gcc -O3 outcomes.c'.
#include <stdio.h>
int partitions(int cards[10], int subtotal)
{
//writeln(cards,subtotal);
int m = 0;
int total;
// Hit
for (int i = 0; i < 10; i++)
{
if (cards[i] > 0)
{
total = subtotal + i + 1;
if (total < 21)
{
// Stand
m += 1;
// Hit again
cards[i] -= 1;
m += partitions(cards, total);
cards[i] += 1;
}
else if (total == 21)
{
// Stand; hit again is an automatic bust
m += 1;
}
}
}
return m;
}
int main(void)
{
int deck[] =
{ 4, 4, 4, 4, 4, 4, 4, 4, 4, 16 };
int d = 0;
for (int i = 0; i < 10; i++)
{
// Dealer showing
deck[i] -= 1;
int p = 0;
for (int j = 0; j < 10; j++)
{
deck[j] -= 1;
int n = partitions(deck, j + 1);
deck[j] += 1;
p += n;
}
printf("Dealer showing %i partitions = %i\n", i, p);
d += p;
deck[i] += 1;
}
printf("Total partitions = %i\n", d);
return 0;
}
Here's the C function, compiled using 'gcc -O3 -fPIC -shared -o libpartitions.so partitions.c'.
int partitions(int cards[10], int subtotal)
{
int m = 0;
int total;
// Hit
for (int i = 0; i < 10; i++)
{
if (cards[i] > 0)
{
total = subtotal + i + 1;
if (total < 21)
{
cards[i] -= 1;
// Stand
m += 1;
// Hit again
m += partitions(cards, total);
cards[i] += 1;
}
else if (total == 21)
{
// Stand; hit again is an automatic bust
m += 1;
}
}
}
return m;
}
And here's the Python wrapper for the C function:
#!/usr/bin/env python
from ctypes import *
import os
test_lib = cdll.LoadLibrary(os.path.abspath("libpartitions.so"))
test_lib.partitions.argtypes = [POINTER(c_int), c_int]
test_lib.partitions.restype = c_int
deck = ([4]*9)
deck.append(16)
d = 0
for i in xrange(10):
# Dealer showing
deck[i] -= 1
p = 0
for j in xrange(10):
deck[j] -= 1
nums_arr = (c_int*len(deck))(*deck)
n = test_lib.partitions(nums_arr, c_int(j+1))
deck[j] += 1
p += n
print('Dealer showing ', i,' partitions =',p)
d += p
deck[i] += 1
print('Total partitions =',d)
Upvotes: 4
Views: 552
Reputation: 1268
I think the reason here is how GCC compiles function partitions
in 2 cases. You can compare asm code in outcomes
binary executable and libpartitions.so
by using objdump
to see the differences.
objdump -d -M intel <file name>
When building to shared library, GCC has no idea how partitions
is called. While in C program, GCC know exactly when partitions
is called (in this case, however, lead to worse performance). This difference in context makes GCC does optimization differently.
You can try different compilers to compare the result. I have checked with GCC 5.4 and Clang 6.0. With GCC 5.4, the Python script runs faster while with Clang, the C program runs faster.
Upvotes: 1