Reputation: 5408
I am trying to convert a vector from A-L to something like this with pandas and numpy built in functions without loops (tile, repeat and reshape). But I cannot wrap my head around
0 1 2 3 4 5 6 7 8 9 10 11
0 A A A A E E E E I I I I
1 B B B B F F F F J J J J
2 C C C C G G G G K K K K
3 D D D D H H H H L L L L
4 A A A A E E E E I I I I
5 B B B B F F F F J J J J
6 C C C C G G G G K K K K
7 D D D D H H H H L L L L
Do you have any ideas how I could do that without loops ?
what I have tried so far:
a = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'])
b = a.reshape(3,4)
np.repeat(b, 4).reshape(4,12)
gives me:
array([['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
['D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'F', 'F', 'F', 'F'],
['G', 'G', 'G', 'G', 'H', 'H', 'H', 'H', 'I', 'I', 'I', 'I'],
['J', 'J', 'J', 'J', 'K', 'K', 'K', 'K', 'L', 'L', 'L', 'L']],
dtype='<U1')
EDIT: Some background. Depending on the number of samples and the layout we choose. A machine, creates plates (like in this image). We can do consecutive operations (add more chemicals etc.) and based on the previous layout, unique combinations are obtained. Afterwards the machine measures e.g. concentration in each well and I would like to link the output to the conditions in each well. Because the machine can measure e.g. concentration after each step, a lot of data can be generated and I am trying to find a generic solution without too many loops.
Upvotes: 3
Views: 1147
Reputation: 152725
You could use:
>>> import numpy as np
>>> x = np.array(list('abcdefghijkl'.upper())) # your "vector"
>>> np.repeat(np.tile(x.reshape(-1, 4), 2).T, 4, axis=1)
array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']],
dtype='<U1')
It first reshapes it so that you have 4 characters in each column, then duplicates them. Then you transpose it so you have the correct rows/columns and finally you just repeat every character 4 times.
Step-by-step it looks like this:
>>> import pandas as pd
>>> x
array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'],
dtype='<U1')
>>> x.reshape(-1, 4)
array([['A', 'B', 'C', 'D'],
['E', 'F', 'G', 'H'],
['I', 'J', 'K', 'L']],
dtype='<U1')
>>> np.tile(_, 2)
array([['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
['E', 'F', 'G', 'H', 'E', 'F', 'G', 'H'],
['I', 'J', 'K', 'L', 'I', 'J', 'K', 'L']],
dtype='<U1')
>>> _.T
array([['A', 'E', 'I'],
['B', 'F', 'J'],
['C', 'G', 'K'],
['D', 'H', 'L'],
['A', 'E', 'I'],
['B', 'F', 'J'],
['C', 'G', 'K'],
['D', 'H', 'L']],
dtype='<U1')
>>> np.repeat(_, 4, axis=1)
array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']],
dtype='<U1')
>>> pd.DataFrame(_)
0 1 2 3 4 5 6 7 8 9 10 11
0 A A A A E E E E I I I I
1 B B B B F F F F J J J J
2 C C C C G G G G K K K K
3 D D D D H H H H L L L L
4 A A A A E E E E I I I I
5 B B B B F F F F J J J J
6 C C C C G G G G K K K K
7 D D D D H H H H L L L L
Upvotes: 3
Reputation: 215047
a = np.array(list("ABCDEFGHIJKL"))
a
# array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'],
# dtype='<U1')
np.repeat(np.tile(a.reshape(3,4), 2).T, 4, axis=1)
#array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
# ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
# ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
# ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
# ['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
# ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
# ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
# ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']],
# dtype='<U1')
Upvotes: 2