Reputation: 31
I'm wondering how to get a random numpy array of integers using DNA bases. I have the basic numpy function working, but I can't accomplish this without transforming the numpy array into a list of strings and back to integers. So I failed
#A = 1
#T = 2
#G = 3
#C = 4
np.random.randint(1, 5, size=(5, 3))
array([[1, 2, 1],
[2, 2, 3],
[2, 4, 2],
[4, 2, 1],
[1, 3, 4]])
Desirable output will be integers in a numpy array
array([[121],
[223],
[242],
[421],
[134]])
Thank you for any ideas
Upvotes: 0
Views: 229
Reputation: 2511
Here's another answer with numpy.
Strategy: First precompute the bases (there are only 64 of them so it's no biggie), and then use np.random.choice
.
from itertools import product
nums = "1234"
bases = map(int,map("".join, product(nums,nums,nums)))
np.random.choice(bases,10**8)
Casting as an integer happens during the precompute step and so won't be a bottle neck. Generates a hundred million base pairs in no time on a macbook.
If you want to compute a lot of basepairs, this way is about 5 times faster (3 seconds vs 17 seconds for 10**8 random bases) than the one-liner that first generates random numbers and then takes the dot product. That strategy requires two passes over the data instead of mine -- which takes one pass.
In general, if you want d
base pairs and N
sample, then this does the trick:
bases = map(int,map("".join, product(*[nums]*d))
np.random.choice(bases,N)
If d is larger than 8 or 9, then bases will be sufficiently long that you probably would be better off going with the other version using the dot product. But if d
is small -- then this is definitely faster.
Upvotes: 1
Reputation: 17867
Why not constructing a 3-digit integer from the 3 separate integers you already have:
import numpy as np
r = np.random.randint(1, 5, size=(5, 3))
print (r[:, 0] * 100 + r[:, 1] * 10 + r[:, 2])[:, None]
Output:
[[444]
[332]
[213]
[434]
[341]]
Depending on the required output shape you might not need to do the reshape via [:, None]
. But this version yields exactly the example output format.
One-liner:
A more compact version uses the dot product between the random matrix and a vector of decimal powers:
print np.random.randint(1, 5, size=(5, 3)).dot([100, 10, 1])[:, None]
More flexible:
In general, you can generate the array depending on the number of rows n
and columns d
:
print np.random.randint(1, n, size=(n, d)).dot(np.power(10, range(d)))[:, None]
Upvotes: 2
Reputation: 805
Think you would be best off using the methodology described in your question... int() --> str() --> int()
>>> thing = np.random.randint(1, 5, size=(5, 3))
>>> [int(''.join([str(x) for x in a])) for a in thing]
Out[47]: [414, 311, 221, 232, 131]
Or, for a numpy type answer:
>>> foo = lambda x: int(''.join([str(n) for n in x]))
>>> np.apply_along_axis(foo, 1, thing)
Out[7]: array([414, 311, 221, 232, 131])
Upvotes: 0