Reputation: 83
Magic Numbers
A positive integer is “magic” if, and only if, it can be reduced to 1 by repeatedly dividing it by 2 if it’s even or multiplying it by 3 and then adding 1 if it’s odd. So, for example, 3 is magic because 3 reduces first to 10 (3*3+1), then to 5 (10/2), then to 16 (5*3+1), then to 8 (16/2), then to 4 (8/2), then to 2 (4/2), and finally to 1 (2/2). The magic numbers hypothesis states that all positive integers are magic, or, formally: ∀x ∈ Z, MAGIC(x) where MAGIC(x) is the predicate “x is magic”.
We are supposed to develop a C++ program to find "Magic Numbers" from 1 to 5 Billion. It is supposed to take 80 seconds or less if being done correctly. Mine takes approximately 2 hours, and the example program we were given would take 14 days. Here is my code, what can I do to speed it up? Am I missing obvious optimization issues?
#include <iostream>
using namespace std;
bool checkMagic(unsigned long number);
int main()
{
for(unsigned long i = 1; i <= 5000000000; i++)
{
if(i % 1000000 == 0)
{
//only print out every 1000000 numbers to keep track, but slow it down
//by printing out every single number
cout<<i<<endl;
}
if(!checkMagic(i))
{
//not magic
cout<<i<<" not a magic number"<<endl;
break;
}
}
}
bool checkMagic(unsigned long number)
{
if(number % 2 == 0)
{
number = number / 2;
}
else
{
number = (number * 3) + 1;
}
if(number != 1)
{
checkMagic(number);
}
return 1;
}
Upvotes: 8
Views: 4855
Reputation: 5566
I was speculating about what changes the optimizer might have made for this particular tail recursion.
So I created an iterative answer from my V9 code (in my other answer).
Change From:
bool checkMagicR(uint64_t n)
{
//.. replace all
}
Change To:
// ITERATIVE VERSION
bool checkMagicR(uint64_t n) // ITERATIVE Performance
{
do
{ uint64_t nxt;
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause.
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause.
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
n = nxt;
} while(true); // NO recursion
}
Did not fix the name. Did not fix comments or other mentioning of 'recursion' in this method or others. I changed some comments at top and bottom of this method.
Results:
Performance is identical. (44 seconds)
Mode 1 and mode 2 are both iterative.
Mode 3 is (still) recursive.
Upvotes: 0
Reputation: 5566
what can I do to speed it up? Am I missing obvious optimization issues?
A for-loop is not the fastest coding style to invoke a method 5B times.
In my code see unwind100() and innerLoop(). The unwind'ing I coded removed 99% of the innerLoop overhead, which saved more than 5 seconds (on my desktop DD5). Your savings may vary. Wikipedia has an article about unwinding a loop, with a brief description of the savings potential.
This code apparently generates progress reports. At a cost of 5 billion comparisons, it accomplishes nothing for the magic number checks.
See my code outerLoop, which calls innerLoop 10 times. This provides a convenient location for 1 progress report each outer loop. Consider splitting your loops into an 'innerLoop' and an 'outerLoop'. Testing 5 Billion times is more expensive than adding 10 calls to innerLoops.
I think your checkMagic() does not achieve tail-recursion, which really will slow your code down.
See my method checkMagicR(), which has tail recursion, and returns a true or false.
Also, in your CheckMagic(), in the clause for odd input, you ignore the idea that (3n+1) must be an even number when n is positive and odd. My code performs an 'early-div-by-2', thus reducing the future effort when possible.
In my earlier versions, the code wasted time detecting that the known even input was even, then dividing it by 2, and then returning true. checkMagicRE() skips the test and divide, saving time.
CheckMagicRO() skips the test for odd, then performs the odd stuff and continues into checkMagicR().
Your recursion continues all the way down to 1 ... not necessary, and potentially the biggest hit to overall performance.
See my checkMagicR() "recursion termination clause". My code stops when ((n/2) < m_N), m_N being the test value that started this recursion. The idea is that all smaller test inputs are already proven magic, else the code would have asserted. It also halts when wraparound is imminent ... which has not happened.
Results:
My desktop (DD5) is older, slower, and running Ubuntu 15.10 (64). This code was developed with g++ v5.2.1, and only finishes without timeout assert using -O3.
DD5 is apparently 2x slower than the machine used by Amy Tavory (comparing how his code ran on DD5).
On DD5, my code completes in ~44 seconds in mode 1, and 22 seconds in mode 2.
A faster machine might complete this code in 11 seconds (in mode 2).
Full Code:
// V9
#include <chrono>
#include <iomanip>
#include <iostream>
#include <limits>
#include <sstream>
#include <thread>
#include <vector>
#include <cassert>
// chrono typedef - shorten one long name
typedef std::chrono::high_resolution_clock Clock_t;
class T431_t
{
private:
uint64_t m_N; // start-of-current-recursion
const uint64_t m_upperLimitN; // wrap-around avoidance / prevention
std::stringstream m_ssMagic; // dual threads use separate out streams
uint64_t m_MaxRcrsDpth; // diag only
uint64_t m_Max_n; // diag only
uint64_t m_TotalRecurse; // diag only
Clock_t::time_point m_startMS; // Derived req - progress info
std::stringstream m_ssDuration;
std::ostream* m_so; // dual threads - std::cout or m_ssMagic
uint64_t m_blk;
const int64_t m_MaxThreadDuration; // single Max80KMs, dual Max40KMs
// enum of known type (my preference for 'internal' constants)
enum T431_Constraints : uint64_t
{
ZERO = 0,
B00 = 1, // least-significant-bit is bit 0
ONE = 1, // lsbit set
TWO = 2,
THREE = 3,
//
K = 1000, // instead of 1024,
//
BlkSize = ((K * K * K) / 2), // 0.5 billion
//
MaxSingleCoreMS = (80 * K), // single thread max
MaxDualCoreMS = (40 * K) // dual thread max
};
// convenience method: show both hex and decimal on one line (see dtor)
inline std::string showHexDec(const std::string& label, uint64_t val) {
std::stringstream ss;
ss << "\n" << label
<< " = 0x" << std::hex << std::setfill('0') << std::setw(16) << val
<< " (" << std::dec << std::setfill(' ') << std::setw(22) << val << ")";
return (ss.str());
}
// duration: now() - m_startMS
std::chrono::milliseconds getChronoMillisecond() {
return (std::chrono::duration_cast<std::chrono::milliseconds>
(Clock_t::now() - m_startMS));
}
// reports milliseconds since m_startMS time
void ssDurationPut (const std::string& label)
{
m_ssDuration.str(std::string()); // empty string
m_ssDuration.clear(); // clear ss flags
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
m_ssDuration
<< label << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec";
}
inline void reportProgress()
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
assert(durationMS.count() < m_MaxThreadDuration); // normal finish -> "no endless loop"
//
uint64_t durationSec = (durationMS.count() / 1000);
uint64_t durationMSec = (durationMS.count() % 1000);
*m_so << " m_blk = " << m_blk
<< " m_N = 0x" << std::setfill('0') << std::hex << std::setw(16) << m_N
<< " " << std::dec << std::setfill(' ') << std::setw(3) << durationSec
<< "." << std::dec << std::setfill('0') << std::setw(3) << durationMSec
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10) << m_N << ")"
<< std::endl;
}
public:
T431_t(uint64_t maxDuration = MaxSingleCoreMS) :
m_N (TWO), // start val of current recursion
m_upperLimitN (initWrapAroundLimit()),
// m_ssMagic // default ctor ok - empty
m_MaxRcrsDpth (ZERO),
m_Max_n (TWO),
m_TotalRecurse (ZERO),
m_startMS (Clock_t::now()),
// m_ssDuration // default ctor ok - empty
m_so (&std::cout), // default
m_blk (ZERO),
m_MaxThreadDuration (maxDuration)
{
m_ssMagic.str().reserve(1024*2);
m_ssDuration.str().reserve(256);
}
~T431_t()
{
// m_MaxThreadDuration
// m_blk
// m_so
// m_ssDuration
// m_startMS
// m_Max_n
// m_TotalRecurse
// m_MaxRcrsDpth
if (m_MaxRcrsDpth) // diag only
{
ssDurationPut ("\n duration = " );
std::cerr << "\n T431() dtor "
//
<< showHexDec(" u64MAX",
std::numeric_limits<uint64_t>::max()) << "\n"
//
<< showHexDec(" m_Max_n", m_Max_n)
<< showHexDec(" (3*m_Max_n)+1", ((3*m_Max_n)+1))
<< showHexDec(" upperLimitN", m_upperLimitN)
// ^^^^^^^^^^^ no wrap-around possible when n is less
//
<< "\n"
<< showHexDec(" m_MaxRcrsDpth", m_MaxRcrsDpth)
<< "\n"
<< showHexDec(" m_TotalRecurse", m_TotalRecurse)
//
<< m_ssDuration.str() << std::endl;
}
// m_ssMagic
// m_upperLimitN
// m_N
if(m_MaxThreadDuration == MaxSingleCoreMS)
std::cerr << "\n " << __FILE__ << " by Douglas O. Moen Compiled on = "
<< __DATE__ << " " << __TIME__ << "\n";
}
private:
inline bool checkMagicRE(uint64_t nE) // n is known even, and the recursion starting point
{
return(true); // for unit test, comment out this line to run the asserts
// unit test - enable both asserts
assert(nE == m_N); // confirm recursion start value
assert(ZERO == (nE & B00)); // confirm even
return(true); // nothing more to do
}
inline bool checkMagicRO(uint64_t nO) // n is known odd, and the recursion starting point
{
// unit test - uncomment the following line to measure checkMagicRE() performance
// return (true); // when both methods do nothing - 0.044
// unit test - enable both these asserts
// assert(nO == m_N); // confirm recursion start value
// assert(nO & B00); // confirm odd
uint64_t nxt;
{
assert(nO < m_upperLimitN); // assert that NO wrap-around occurs
// NOTE: the sum of 4 odd uint's is even, and is destined to be
// divided by 2 in the next recursive invocation.
// we need not wait to divide by 2
nxt = ((nO+nO+nO+ONE) >> ONE); // combine the arithmetic
// |||||||||||| ||||||
// sum is even unknown after sum divided by two
}
return (checkMagicR(nxt));
}
inline void unwind8()
{
// skip 0, 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind10()
{
assert(checkMagicRE (m_N)); m_N++; // 0
assert(checkMagicRO (m_N)); m_N++; // 1
assert(checkMagicRE (m_N)); m_N++; // 2
assert(checkMagicRO (m_N)); m_N++; // 3
assert(checkMagicRE (m_N)); m_N++; // 4
assert(checkMagicRO (m_N)); m_N++; // 5
assert(checkMagicRE (m_N)); m_N++; // 6
assert(checkMagicRO (m_N)); m_N++; // 7
assert(checkMagicRE (m_N)); m_N++; // 8
assert(checkMagicRO (m_N)); m_N++; // 9
}
inline void unwind98()
{
unwind8();
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void unwind100()
{
unwind10(); // 0
unwind10(); // 1
unwind10(); // 2
unwind10(); // 3
unwind10(); // 4
unwind10(); // 5
unwind10(); // 6
unwind10(); // 7
unwind10(); // 8
unwind10(); // 9
}
inline void innerLoop (uint64_t start_N, uint64_t end_N)
{
for (uint64_t iLoop = start_N; iLoop < end_N; iLoop += 100)
{
unwind100();
}
reportProgress();
}
inline void outerLoop(const std::vector<uint64_t>& start_Ns,
const std::vector<uint64_t>& end_Ns)
{
*m_so << "\n outerLoop \n m_upperLimitN = 0x"
<< std::hex << std::setfill('0') << std::setw(16) << m_upperLimitN
<< "\n m_N = 0x" << std::setw(16) << start_Ns[0]
<< " " << std::dec << std::setfill(' ') << std::setw(3) << 0
<< "." << std::dec << std::setfill('0') << std::setw(3) << 0
<< " sec (" << std::dec << std::setfill(' ') << std::setw(10)
<< start_Ns[0] << ")" << std::endl;
size_t szStart = start_Ns.size();
size_t szEnd = end_Ns.size();
assert(szEnd == szStart);
m_blk = 0;
{ // when blk 0 starts at offset 2 (to skip values 0 and 1)
if(TWO == start_Ns[0])
{
m_N = TWO; // set m_N start
unwind98(); // skip 0 and 1
assert(100 == m_N); // confirm
innerLoop (100, end_Ns[m_blk]);
m_blk += 1;
}
else // thread 2 blk 0 starts in the middle of 5 billion range
{
m_N = start_Ns[m_blk]; // set m_N start, do full block
}
}
for (; m_blk < szStart; ++m_blk)
{
innerLoop (start_Ns[m_blk], end_Ns[m_blk]);
}
}
// 1 == argc
void exec1 (void) // no parameters --> check all values
{
const std::vector<uint64_t> start_Ns =
{ TWO, (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4),
(BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 0 blk 1 blk 2 blk 3 blk 4
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5),
(BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
// outerLoop : 10 blocks generates 10 progressReports()
outerLoop( start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
} // void exec1 (void)
void exec2a () // thread entry a
{
//lower half of test range
const std::vector<uint64_t> start_Ns =
{ TWO , (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4) };
// blk 0 blk 1 blk 2 blk 3 blk 4
const std::vector<uint64_t> end_Ns =
{ (BlkSize*1), (BlkSize*2), (BlkSize*3), (BlkSize*4), (BlkSize*5) };
m_startMS = Clock_t::now();
// m_so to std::cout
// uses 5 blocks to gen 5 progress indicators
outerLoop (start_Ns, end_Ns);
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " (a) " << std::endl;
} // exec2a (void)
void exec2b () // thread entry b
{
// upper half of test range
const std::vector<uint64_t> start_Ns =
{ (BlkSize*5), (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9) };
// blk 5 blk 6 blk 7 blk 8 blk 9
const std::vector<uint64_t> end_Ns =
{ (BlkSize*6), (BlkSize*7), (BlkSize*8), (BlkSize*9), (BlkSize*10) };
m_startMS = Clock_t::now();
m_so = &m_ssMagic;
// uses 5 blocks to gen 5 progress indicators
outerLoop(start_Ns, end_Ns);
ssDurationPut("");
m_ssMagic << " execDuration = " << m_ssDuration.str() << " (b) " << std::endl;
} // exec2b (void)
// 3 == argc
int exec3 (std::string argv1, // recursion start value
std::string argv2) // values count
{
uint64_t start_N = std::stoull(argv1);
uint64_t length = std::stoull(argv2);
// range checks
{
std::string note1;
switch (start_N) // limit check
{
case 0: { start_N = 3; length -= 3;
note1 = "... 0 is below test range (change start to 3 ";
} break;
case 1: { start_N = 3; length -= 2;
note1 = "... 1 is defined magic (change start to 3";
} break;
case 2: { start_N = 3; length -= 1;
note1 = "... 2 is trivially magic (change start to 3";
} break;
default:
{ // when start_N from user is even
if (ZERO == (start_N & B00))
{
start_N -= 1;
note1 = "... even is not an interesting starting point";
length += 1;
}
}
break;
} // switch (start_N)
// start_N not restrained ... allow any manual search
// but uin64_t wrap-around still preempted
std::cout << "\n start_N: " << start_N << " " << note1
<< "\n length: " << length << " " << std::endl;
}
m_startMS = Clock_t::now();
for (m_N = start_N; m_N < (start_N + length); ++m_N)
{
bool magicNumberFound = checkMagicRD (m_N, 1);
std::cout << m_ssMagic.str() << std::dec << m_N
<< " m_MaxRcrsDpth: " << m_MaxRcrsDpth << ";"
<< " m_Max_n: " << m_Max_n << ";" << std::endl;
m_ssMagic.str(std::string()); // empty string
m_ssMagic.clear(); // clear flags)
if (!magicNumberFound)
{
std::cerr << m_N << " not a magic number!" << std::endl;
break;
}
} // for (uint64_t k = kStart; k < kEnd; ++k)
ssDurationPut("");
std::cout << " execDuration = " << m_ssDuration.str() << " " << std::endl;
return(0);
} // int exec3 (std::string argv1, std::string argv2)
// conversion
std::string ssMagicShow() { return (m_ssMagic.str()); }
// D3: use recursion for closer match to definition
bool checkMagicR(uint64_t n) // recursive Performance
{
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return (false); // recursion termination clause
// NOTE: the sum of 4 odd uint's is even, and will be halved
// we need not wait for the next recursion to divide-by-2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// known even
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( true ); // recursion termination clause
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicR(nxt)); // tail recursion
} // bool checkMagicR(uint64_t n)
// D4: diagnostic text output
bool checkMagicRD (uint64_t n, uint64_t rd) // rd: recursion depth
{
if (n > m_Max_n) m_Max_n = n;
m_TotalRecurse += 1;
m_ssMagic << "\n" << rd << std::setw(static_cast<int>(rd)) << " "
<< " checkMagicRD (" << m_N << ", " << n << ")";
uint64_t nxt;
{
if (n & B00) // n-odd
{
if (n >= m_upperLimitN) // wraparound imminent abort
return ( terminateCheckMagic (nxt, rd, false) );
// NOTE: the sum of 4 odd uint's is even, and will be divided by 2
// we need not wait for the next recursion to divide by 2
nxt = ((n+n+n+ONE) >> ONE); // combine the arithmetic
// ||||||||| ||||||
// sum is even unknown after divide by two
}
else // n-even
{
nxt = (n >> ONE); // bit shift divide by 2
if (nxt < m_N) // nxt < m_N start of recursion
return ( terminateCheckMagic (nxt, rd, true) );
// We need not de-curse to 1 because all
// (n < m_N) are asserted as magic
}
}
return (checkMagicRD (nxt, (rd+1))); // tail recursion
} // bool checkMagicRD(uint64_t, uint64_t rd)
bool terminateCheckMagic (uint64_t n, uint64_t rd, bool rslt)
{
if (rd > m_MaxRcrsDpth) // diag only
{
std::chrono::milliseconds durationMS = getChronoMillisecond();
uint64_t durationSec = durationMS.count() / 1000;
uint64_t durationMSec = durationMS.count() % 1000;
m_MaxRcrsDpth = rd;
// generate noise each update - this does not happen often
static uint64_t count = 0;
std::cerr << "\n\n" << std::setfill(' ')
<< std::setw(30) << "["
<< std::setw(4) << durationSec << "." << std::setfill('0') << std::setw(3)
<< durationMSec << std::setfill(' ')
<< " sec] m_N: " << std::setw(10) << m_N
<< " n: " << std::setw(10) << n
<< " MaxRcrsDpth: " << std::setw(5) << m_MaxRcrsDpth
<< " Max_n: " << std::setw(16) << m_Max_n
<< " (" << count << ")" << std::flush;
count += 1; // how many times MaxRcrsDpth updated
}
m_ssMagic << "\n " << std::setw(3) << rd << std::setw(static_cast<int>(rd)) << " "
<< " " << (rslt ? "True " : ">>>false<<< ") << m_N << ", ";
return (rslt);
}
uint64_t initWrapAroundLimit()
{
uint64_t upLimit = 0;
uint64_t u64MAX = std::numeric_limits<uint64_t>::max();
// there exist a max number, m_upperLimitN
// where 3n+1 < u64MAX, i.e.
upLimit = ((u64MAX - 1) / 3);
return (upLimit);
} // uint64_t initWrapAroundLimit()
public:
int exec (int argc, char* argv[])
{
int retVal = -1;
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /* do nothing*/ }; }
// result: consistent run time
switch (argc)
{
case 1: // no parameters: 5 billion tests, 10 blocks, this instance, < 80 sec
{
exec1();
retVal = 0;
}
break;
case 2: // one parameter: 2 threads each runs 5/2 billion tests, in 5 blocks,
{ // two new instances, < 40 sec
// 2 new objects
T431_t t431a(MaxDualCoreMS); // lower test sequence
T431_t t431b(MaxDualCoreMS); // upper test sequence
// 2 threads
std::thread tA (&T431_t::exec2a, &t431a);
std::thread tB (&T431_t::exec2b, &t431b);
// 2 join's
tA.join();
tB.join();
// lower test sequence (tA) output directly to std::cout
// upper test sequence (tB) captured in T431_t::m_ssMagic. send it now
std::cout << t431b.ssMagicShow() << std::endl;
retVal = 0;
}
break;
case 3: // argv[1] -> start address, argv[2] -> length,
{
retVal = exec3 (std::string(argv[1]), std::string(argv[2]));
} break;
default :
{
std::cerr
<< "\nUsage: "
<< "\n argc (= " << argc << ") not expected, should be 0, 1, or 2 parameters."
<< "\n 1 (no parameters) : 5 billion tests, 10 blocks, < 80 sec\n"
//
<< "\n 2 (one parameter) : 2 threads, each 5/2 billion tests, 5 blocks, < 40 sec\n"
//
<< "\n 3 (two parameters): verbose mode: start argv[1], test count argv[2] \n"
//
<< "\n 3.1: \"./dumy431V4 3 1000 1> /tmp/dumy431V4.out2 2> /tmp/dumy431V4.out2a "
<< "\n 3 1 K std::cout redirected std::cerr redirected \n"
//
<< "\n 3.2: \"./dumy431V4 3 1000000000 1> /dev/null 2> /tmp/dumy431V4.out2b "
<< "\n 3 1 B discard std::cout std::cerr redirected \n"
//
<< "\n 3.3: \"./dumy431V4 15 100 "
<< "\n 15 100 (cout and cerr to console) \n"
<< std::endl;
retVal = 0;
}
} // switch
return(retVal);
} // int exec(int argc, char* argv[])
}; // class T431
int main (int argc, char* argv[])
{
std::cout << "argc: " << argc << std::endl;
for (int i=0; i<argc; i+=1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
int retVal;
{
T431_t t431;
retVal = t431.exec(argc, argv);
}
std::cout << "\nFINI " << std::endl;
return(retVal);
}
The code submitted:
a) uses assert(x) for all checks (because assert is quick and has side-effects that the optimizer can not ignore).
b) detects any endless loop by using an assert of the duration.
c) asserts to prevent any wrap-around.
When any assert occurs, the program does not terminate normally.
When this code finishes normally, it means:
a) no disproven numbers found, and
b) the running time < 80 seconds, so no endless loop occurred and
c) no wrap-around occurred
Notes about Recursive checkMagicR(uint64_t n):
3 forms of n are available in the recursion:
formal parameter n - typical of recursion calls
auto var nxt - receives the computed value of n for next recursive call
data attribute: m_N - the starting point of the currently running recursion effort
Upvotes: 0
Reputation: 76297
This question basically asks to verify the Collatz Conjecture up to 5B.
I think the key here is, for each number n we are checking, to view the optimistic scenario and the pessimistic scenario, and to check the optimistic one before reverting to the pessimistic one.
In the optimistic scenario, as we we modify n according to the n / 2 ; 3n + 1 rule, the sequence of numbers will:
in a finite number of steps become smaller than n (in which case we can check what we know about that smaller number).
will not cause an overflow in the steps.
(on which, as TonyK correctly notes, we cannot rely (even not on the first)).
So, for the optimistic scenario, we can use the following function:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
Note the following:
The function only attempts to verify the conjecture for the number it receives. If it returns true
, it has been verified. If it returns false
, it just means it hasn't been verified (i.e., it does not mean it has been disproved).
It takes a parameter max_tries
, and only verifies up to this number of steps. If the number has been exceeded, it does not try to discern whether this is part of an infinite loop or not - it just returns that verification failed.
It takes an unordered_set
of known numbers that have failed (of course, if the Collatz conjecture is true, this set will always be empty).
It detects overflow via __builtin_*_overflow
. (Unfortunately, this is gcc specific, then. A different platform might require a different set of such functions.)
Now for the slow-but-sure function. This function uses the GNU MP multi-precision arithmetic library. It checks for an infinite loop by maintaining the sequence of numbers it has already encountered. This function returns true
if the conjecture has been proved for this number, and false
if has been disproved for this number (note the difference from the previous fast verification).
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
Finally, main
maintains an unordered_set
of the disproven numbers. for each number, it optimistically tries to verify the conjecture, then, if it fails (for overflow or exceeding the number of iterations), uses the slow method:
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
}
Running the full code (below) gives:
$ g++ -O3 --std=c++11 magic2.cpp -lgmp && time ./a.out
iteration 1000000000
iteration 2000000000
iteration 3000000000
iteration 4000000000
iteration 5000000000
real 1m3.933s
user 1m3.924s
sys 0m0.000s
$ uname -a
Linux atavory 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ sudo lshw | grep -i cpu
*-cpu
description: CPU
product: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
bus info: cpu@0
version: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts cpufreq
I.e., no disproven numbers found, and the running time at ~64 seconds.
Full code:
#include <unordered_set>
#include <set>
#include <iostream>
#include <memory>
#include <list>
#include <gmp.h>
using namespace std;
using found_set = unordered_set<size_t>;
bool fast_verify(size_t i, size_t max_tries, const found_set &found) {
size_t tries = 0;
size_t n = i;
while(n != 1) {
if(++tries == max_tries )
return false;
if(n < i)
return found.empty() || found.find(i) == found.end();
if(n % 2 == 0)
n /= 2;
else if(__builtin_mul_overflow (n, 3, &n) || __builtin_add_overflow(n, 1, &n))
return false;
}
return true;
}
bool slow_check(size_t i) {
mpz_t n_;
mpz_init(n_);
mpz_t rem_;
mpz_init(rem_);
mpz_t i_;
mpz_init(i_);
mpz_set_ui(i_, i);
mpz_set_ui(n_, i);
list<mpz_t> seen;
while(mpz_cmp_ui(n_, 1) != 0) {
if(mpz_cmp_ui(n_, i) < 0)
return true;
mpz_mod_ui(rem_, n_, 2);
if(mpz_cmp_ui(rem_, 0) == 0) {
mpz_div_ui(n_, n_, 2);
}
else {
mpz_mul_ui(n_, n_, 3);
mpz_add_ui(n_, n_, 1);
}
seen.emplace_back(n_);
for(const auto &e0: seen)
for(const auto &e1: seen)
if(&e0 != &e1 && mpz_cmp(e0, e1) == 0)
return false;
}
return true;
}
int main()
{
const auto max_num = 5000000000;
found_set found;
for(size_t i = 1; i <= max_num; i++) {
if(i % 1000000000 == 0)
cout << "iteration " << i << endl;
auto const f = fast_verify(i, max_num, found);
if(!f && !slow_check(i))
found.insert(i);
}
for(auto e: found)
cout << e << endl;
return 0;
}
Upvotes: 4
Reputation: 17114
Your code, and Ami Tavory's answer, completely ignore the issue of overflow. If there were a non-magic number in your range, then the Collatz iteration would either enter a loop, or increase without bound. If it increases without bound, it will certainly overflow, whatever the size of your unsigned long
; and if it enters a loop, it might well overflow before it gets there. So you must put an overflow check in there:
#include <limits.h>
...
if (number > (ULONG_MAX - 1) / 3)
PrintAnApologyAndDie() ;
number = (number * 3) + 1;
Upvotes: 2