Michael Strobel
Michael Strobel

Reputation: 367

How to get 3 unique values using random.randint() in python?

I am trying to populate a list in Python3 with 3 random items being read from a file using REGEX, however i keep getting duplicate items in the list. Here is an example.

import re
import random as rn

data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
    index = inFile.read()
    URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)

    list_0 = []
    for i in range(3):
        list_0.append(URLS[rn.randint(1, 30)])
    inFile.close()

for i in range(len(list_0)):
    print(list_0[i])

What would be the cleanest way to prevent duplicate items being appended to the list?

(EDIT) This is the code that i think has done the job quite well.

def random_sample(data):
    r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
    with open(data, 'r') as inFile:
        urls = re.findall(r'%s' % r_e[0], inFile.read())
        x = list(set(urls))
        inFile.close()
    return x

data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
    print(sample[i])

Unordered collection with no duplicate entries.

Upvotes: 1

Views: 1080

Answers (2)

pjs
pjs

Reputation: 19855

Use the builtin random.sample.

random.sample(population, k)
    Return a k length list of unique elements chosen from the population sequence or set.
    Used for random sampling without replacement.

Addendum

After seeing your edit, it looks like you've made things much harder than they have to be. I've wired a list of URLS in the following, but the source doesn't matter. Selecting the (guaranteed unique) subset is essentially a one-liner with random.sample:

import random

# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
SUBSET_SIZE = 3

# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList)    # produces, e.g., => ['url7', 'url3', 'url4']

Note that by using len(URLS) and SUBSET_SIZE, the one-liner that does the work is not hardwired to the size of the set nor the desired subset size.


Addendum 2

If the original list of inputs contains duplicate values, the following slight modification will fix things for you:

URLS = list(set(URLS))  # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]

Or even better, because it doesn't need two conversions:

URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]

Upvotes: 3

Pablo Miranda
Pablo Miranda

Reputation: 369

seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]

# [...]

if randValue not in seen:
  seen.add(randValue)
  list_0.append(randValue)

Now you just need to check list_0 size is equal to 3 to stop the loop.

Upvotes: 1

Related Questions