trying to split list by percentage

Question

I'm trying to split a list by taking in a percentage and randomly grabbing elements out of the main list into 2 other lists. The trainingSet is the left over list. I'm running into a problem when i'm generating a random index to pick from. This code works with a small list but when I work with (len(rawRatings) = 1000) it does not work.

error:

  File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in 
      # Used internally for debug sandbox under external interpreter
    File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 29, in partitionRankings
    File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 241, in randint
return self.randrange(a, b+1)
    File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 217, in randrange
      raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width)
  ValueError: empty range for randrange() (0,0, 0)

raw Ratings is a list, and testPercent is a float.

ex.

rawRatings = [(123,432,4),(23,342,3),(23,123,5),(234,523,3),(34,23,1), (12,32,4)]
testPercent = .2
partitionRankings(rawRatings, testPercent)
[(23,123,5),(234,523,3),(34,23,1),(123,432,4),(12,32,4)],[(23,342,3)]


def partitionRankings(rawRatings, testPercent):
    testSet = []
    trainingSet = []
    howManyNumbers = int(round(testPercent*len(rawRatings)))
    declineRandom = 0
    while True:
        if declineRandom == howManyNumbers:
                    break        
        randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
        testSetTuple = rawRatings[randomIndex]
        del rawRatings[randomIndex]
        testSet.append(testSetTuple)

        declineRandom = declineRandom + 1
    trainingSet = rawRatings[:]
    return (trainingSet), (testSet)

I don't want to choose the same random Index. Once, I choose one, I don't want to randomly select it again. I don't think this is correct. This is the part I'm having trouble with.

randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)

Rob Watts · Accepted Answer

Since order of the training set does not matter, you can do this with an entirely different strategy - shuffle the list of rawRatings, and then take the first howManyNumbers elements as your test set, and the rest as your training set.

import random

def partitionRankings(rawRatings, testPercent):
    howManyNumbers = int(round(testPercent*len(rawRatings)))
    shuffled = rawRatings[:]
    random.shuffle(shuffled)
    return shuffled[howManyNumbers:], shuffled[:howManyNumbers]

As for why your code as you have it doesn't work, the problem is, as you guessed, with this line:

randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)

The problem is with -declineRandom.

Every time you go through the loop, you remove the entry that you picked, so even if you were to get the same index again you would not be picking the same element.
If you didn't remove the element from the list on each iteration, this would not prevent picking the same element twice - this only prevents you from picking any of the last declineRandom elements.
- You'd have to move the elements to the end of the list at each iteration.
Because you delete elements and then don't replace them at the end of the list, len(rawRatings) shrinks while declineRandom grows.
- If you have a list of 1000 items and try to put 600 in the test set, when you have 550 items in the test set you be trying to get a random int that is greater than or equal to zero and less than or equal to (450-1)-550=-101. Obviously you wouldn't actually get to that point, but hopefully it makes the issue clear.

trying to split list by percentage

Answers (1)

Related Questions