Reputation: 6848
Dataframe is:
date ids_x ids_y
0 2011-04-23 [0, 1, 2, 10, 11, 12, 13] []
1 2011-04-24 [0, 1, 2, 10, 11, 12, 13] [12,4]
2 2011-04-25 [0, 1, 2, 3, 4, 1, 12] []
3 2011-04-26 [0, 1, 2, 3, 4, 5, 6] [4,5,6]
The convenient way, but slow way, is to use:
df['ids'] = df['ids_x'] + df['ids_y']
I want to achieve this method by numpy
, for now it is very slow 4 seconds
. As Pandas
use numpy I think I should use numpy without using Pandas
in order to reduce the overhead.
I use column_stack
but the output is:
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[9,8,7],[6,5,4,6,7,8]])
np.column_stack((a,b))
[out]: array([[1, 2, 3, [9, 8, 7]], [4, 5, 6, [6, 5, 4, 6, 7, 8]]], dtype=object)
Upvotes: 3
Views: 211
Reputation: 69126
The problem with np.column_stack
is that in b
you don't have equal-length columns (and thus a dtype
of object
).
You can do this with np.concatenate
(or as @John Galt said in comments np.append
); e.g.:
In [43]: [np.concatenate((i,j)) for i,j in zip(a,b)])
Out[43]: [array([1, 2, 3, 9, 8, 7]), array([4, 5, 6, 6, 5, 4, 6, 7, 8])]
Upvotes: 1