Reputation: 615
I have some very big lists with different length. I want to reduce them all to exactly the same size (for example to 1000 elements)! I know there are kind of some "similar" questions, but I didn't find the coorect answer to my problem. So here an example what I did. For simplification we only use three lists here.
a = list(range(10000))
b = list(range(9879))
c = list(range(10345))
# Now I want to reduce all the lists to exactly 1000 elements
# I tried this approched like I read in some other questions:
aa = a[::len(a) // 1000] # len(aa) = 1000
bb = b[::len(b) // 1000] # len(bb) = 1098
cc = c[::len(c) // 1000] # len(cc) = 1035
But with this approache, the resulting lists have not the same length. How can I now randomly remove some elements of list bb and list cc to have also exact the length of 1000 elements? I don't want to just remove the last x-elements or the first x-elements. Or is there a better solution to reduce lists of different length to the exact same length?
Edit: The order of the resulting lists (aa, bb, cc) should be the same as my original lists. I don't want to shuffle them randomly.
Upvotes: 1
Views: 1091
Reputation: 2075
import random
a = list(range(10000))
b = list(range(9879))
c = list(range(10345))
def randomm(x):
while True:
u = []
r = random.randint(0,x)
if r in u:
pass
else:
u.append(r)
return r
aa = [a[randomm(len(a))] for i in range(1000)]
bb = [b[randomm(len(b))] for i in range(1000)]
cc = [c[randomm(len(c))] for i in range(1000)]
I created a container for the random generator, so that number is repeated.
Upvotes: 0
Reputation: 71
Accoring to the first comment your code would look like:
from random import sample
a = list(range(10000))
b = list(range(9879))
c = list(range(10345))
# Now I want to reduce all the lists to exactly 1000 elements
# I tried this approched like I read in some other questions:
aa = sample(a[::len(a) // 1000],1000) # len(aa) = 1000
bb = sample(b[::len(b) // 1000],1000) # len(bb) = 1000
cc = sample(c[::len(c) // 1000],1000) # len(cc) = 1000
note that the elements of aa are now shuffled
An non shuffled solution would be:
import numpy as np
a = np.array(range(10000))
b = np.array(range(9879))
c = np.array(range(10345))
# Now I want to reduce all the lists to exactly 1000 elements
# I tried this approched like I read in some other questions:
indeces = np.array(range(len(a))) ## make indeces
remove = np.random.permutation(len(a))[:1000] ## select indeces to remove
selected = np.in1d(indeces, remove, assume_unique=True) ## make list of indeces that are selected, faster on unique
aa = a[selected] # len(aa) = 1000 ## select indeces
indeces = np.array(range(len(b)))
remove = np.random.permutation(len(b))[:1000]
selected = np.in1d(indeces, remove)
bb = b[selected] # len(bb) = 1000
indeces = np.array(range(len(c)))
remove = np.random.permutation(len(c))[:1000]
selected = np.in1d(indeces, remove)
cc = c[selected] # len(cc) = 1000
Upvotes: 2
Reputation: 5745
I would just do as you did and then cut off the last elements.. it is impossible to spread the elements evenly without some extra element when the number cant be divided by 1000 so:
aa = a[::len(a) // 1000] [:1000]
bb = b[::len(b) // 1000][:1000]
cc = c[::len(c) // 1000] [:1000]
if you insist on not taking the out the last elements that left.. you can use after this above code the other answer and choose randomly..
Upvotes: 2
Reputation: 74
you can use random.shuffle function so that you can take first x elements of the array.
import random
a = list(range(10000))
b = list(range(9879))
c = list(range(10345))
random.shuffle(a)
random.shuffle(b)
random.shuffle(c)
# Now I want to reduce all the lists to exactly 1000 elements
# I tried this approched like I read in some other questions:
aa = a[:1000]
bb = b[:1000]
cc = c[:1000]
Upvotes: 2