Mrcitrusboots
Mrcitrusboots

Reputation: 317

Way to efficiently calculate XOR(^) checksum based on a list of IDs

When googling information about Python list comprehension I was offered a google foobar challenge, which I've been slowly working on the past few days for fun. The latest challenge:

enter image description here

effectively calls for generating a list of IDs, ignoring an increasing number from each new line until there's one ID left. Then you are supposed to XOR(^) the IDs to produce a checksum. I created a working program that outputs the correct answer, however it's not efficient enough to pass all test cases(passes 6/10) in the time allotted. A length of 50,000 should produce a result in under 20 seconds, but it takes 320.

Could someone steer me in the right direction, but please don't do it for me, I'm having fun pushing myself with this challenge. Perhaps there's a data structure or algorithm I can implement to speed up the computation time?

Logic behind code:

  1. First, the starting ID and length are taken in

  2. A list of IDs is generated, ignoring an increasing number of IDs from each new line, starting with ignoring 0 from the first line.

  3. XORs all of the numbers in the list of IDS using a for loop

  4. Answer is returned as an int


import timeit
def answer(start,length):
    x = start
    lengthmodified = length
    answerlist = []
    for i in range (0,lengthmodified): #Outter for loop runs an amount of times equal to the variable "length".
        prestringresult = 0
        templist = []
        for y in range (x,x + length): #Fills list with ids for new line
            templist.append(y)
        for d in range (0,lengthmodified): #Ignores an id from each line, increasing by one with each line, and starting with 0 for the first
            answerlist.append(templist[d])
        lengthmodified -= 1
        x += length    
        for n in answerlist: #XORs all of the numbers in the list via a loop and saves to prestringresult
            prestringresult ^= n
        stringresult = str(prestringresult) 
        answerlist = [] #Emptys list
        answerlist.append(int(stringresult)) #Adds the result of XORing all of the numbers in the list to the answer list
    #print(answerlist[0]) #Print statement allows value that's being returned to be checked, just uncomment it
    return (answerlist[0]) #Returns Answer



#start = timeit.default_timer()
answer(17,4)
#stop = timeit.default_timer()
#print (stop - start) 

Upvotes: 4

Views: 4417

Answers (4)

Suresh Lamichhane
Suresh Lamichhane

Reputation: 566

Most of the people would get Time limit exceeded in this question. I did! This question can be concluded in this way: "Find the XOR of all the numbers that lies between certain range in constant time." Yes, Constant time!

So from 3-6, the answer should be 3^4^5^6 = 4 in O(1) time.

Solution: XOR are associative in nature. So A ^ B ^ C can be written as B ^ A ^ C. Also, we know that XOR means: 'AND'ing the same bits result into True i.e 1 and different bits result into 2.

From these 2 nature we can write: XOR between all the numbers from 3-6 can be written as:

3^4^5^6 = (0^1^2)^(0^1^2) ^ (3^4^5^6)
        = (0^1^2^3^4^5^6) ^ (0^1^2) (this comes from the associative nature of xor)
        = XOR betn all the numbers from (0-6) ^ XOR betn all the numbers from (0-2)...eq(1)

So Now if we can find XOR of all the numbers from 0 to certain integer in a constant time, we will get our answer.

Luckily, there exists a pattern for us:

See this for an example:

(0-1): 0 ^ 1 = 1 (1)
(0-2): 0 ^ 1 ^ 2 = 3 (2+1)
(0-3): 0 ^ 1 ^ 2 ^ 3 = 0 (0)
(0-4): 0 ^ 1 ^ 2 ^ 3 ^ 4 = 4 (4)

(0-5): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 = 1 (1)
(0-6): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 = 7 (6+1)
(0-7): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^  7 = 0 (0)
(0-8): 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 ^ 8 = 8 (8)


So the pattern for finding the xor between all the integers between 0 to n is:
if n%4 == 1 then, answer = 1
if n%4 == 2 then, answer = n+1
if n%4 == 3 then, answer = 0
if n%4 == 0 then answer = n 

Therefore, XOR(0-6) becomes 7 (since 6%4 ==2) and XOR(0-2) becomes 3 (since 2%4 ==2)

Therefore, the eq(1) now becomes:
3^4^5^6 = 7 ^ 3 = 4

Now the question is quite easy, most of us get stuck because of the time limit exceeded error, because we try to xor in each loop which would be huge if the number of input/iteration increases. Here is my working solution in python for which all the test cases were passed by google:

#Main Program
def answer(start, length):
    checkSum = 0
    for l in range(length, 0, -1):
        checkSum = checkSum ^ (getXor(start + l-1) ^ getXor(start-1))
        start = start + length
    return checkSum

def getXor(x):
    result = [x, 1, x+1, 0]
    return result[x % 4]

Upvotes: 3

Stefan Pochmann
Stefan Pochmann

Reputation: 28656

You'll likely need a different approach, not just minor improvements like John's. I just wrote a solution that can do answer(0, 50000) in about 2 seconds on my PC. I still do it row by row, but instead of xoring all numbers in the row's range, I do it bit by bit. How many numbers in the row have the 1-bit set?[*] An odd number of numbers? Then I flip the 1-bit of my answer. And then the same with for the 2-bit, the 4-bit, the 8-bit, etc, up to the 230-bit. So for each row, it's just 31 little calculations (instead of actually xoring tens of thousands of numbers).

[*] Can be computed quickly in constant time just from start/stop of the range.

Edit: Since you asked for another hint, here's how to compute how often the 1-bit is set in some range(a, b). Compute how often it's set in the range(0, a) and subtract that from how often it's set in the range(0, b). It's easier if the range starts at zero. How often is the 1-bit set in some range(0, c)? Easy: c//2 times. So how often is the 1-bit set in some range(a, b)? Simply b//2 - a//2 times. The higher bits are similar, just a little more complicated.

Edit 2: Oh wait, I just remembered... there's an easy way to compute the xor of all numbers in some range(a, b). Again split the work into doing range(0, a) and range(0, b). The xor of all numbers in some range(0, c) is easy because there's a nice pattern (you'll see it if you do it for all c from 0 to let's say 30). Using this, I now solve answer(0, 50000) in about 0.04 seconds.

Upvotes: 5

Michael
Michael

Reputation: 184

I was able to get a little improvement without using list, but it still will fail on large numbers. Nested loops kill speed. I think you need to follow Pochmann logic as brute force is rarely the way to go for these types of problems.

enter image description here

Upvotes: 2

John Kugelman
John Kugelman

Reputation: 362087

Neither templist nor answerlist are really needed. Let's take a couple passes at your code to see how to eliminate them.

  1. First, let's make templist's initialization a one-liner. This:

    templist = []
    for y in range (x,x + length):
        templist.append(y)
    

    Becomes this:

    templist = list(range(x, x + length))
    
  2. Then let's do the same for answerlist. This:

    for d in range (0,lengthmodified):
        answerlist.append(templist[d])
    

    Becomes this:

    answerlist.extend(templist[:lengthmodified])
    
  3. Now let's take a look at how they're used later. If we ignore lengthmodified -= 1 and x += length for now, we have:

    templist = list(range(x, x + length))
    answerlist.extend(templist[:lengthmodified])
    
    for n in answerlist:
        prestringresult ^= n
    
    answerlist = []
    

    Instead of extending answerlist, iterating over it, and then clearing it, it'd be faster to just iterate over templist.

    templist = list(range(x, x + length))
    
    for n in templist[:lengthmodified]:
        prestringresult ^= n
    

    And now there's no need for templist either, so let's skip building it as well.

    for n in range(x, x + lengthmodified):
        prestringresult ^= n
    

    templist and answerlist are gone.

The only missing piece here is working answerlist.append(int(stringresult)) back in. I'll leave that for you to figure out.

Overall, the lesson here is to avoid explicit for loops whenever you can. Writing lots of for loops that iterate over containers is a C way of thinking. In Python there are often ways to chew through collections all at once. Doing so lets you leverage the language's speedy builtin operations.

As a bonus, idiomatic Python is plain easier to read, too.

Upvotes: 1

Related Questions