Reputation: 1171
I am trying to use the split function within numpy as follows:
desc = np.array(['Alu Bokhara','Kurma Polao'])
Then I am trying to extract and print the first word from each element within the array as follows:
np.array([np.split(i,' ')[0] for i in desc])
Then I am getting the error:
tuple index out of range
Any hint over this issue?
Thanks
Upvotes: 2
Views: 483
Reputation: 221524
As an alternative vectorized approach, you could use np.core.defchararray.split
-
[i[0] for i in np.core.defchararray.split(desc, sep=' ')]
Basically, we are splitting each element based on the space character
, thus separating out words in a sub-list each and then simply selecting the first element from each sub-list.
Sample run -
In [117]: desc
Out[117]:
array(['Alu Bokhara', 'Kurma Polao'],
dtype='|S11')
In [118]: [i[0] for i in np.core.defchararray.split(desc, sep=' ')]
Out[118]: ['Alu', 'Kurma']
Runtime test -
In [142]: desc = np.array(['Then I', 'am trying to' ,'extract and', 'print',\
...: 'the first word from each', 'element within the', 'array'])
In [143]: %timeit pd.Series(desc).str.split().str[0].values #@piRSquared's soln
1000 loops, best of 3: 509 µs per loop
In [144]: %timeit [i[0] for i in np.core.defchararray.split(desc, sep=' ')]
100000 loops, best of 3: 13.8 µs per loop
Upvotes: 2
Reputation: 294218
You could use pandas
. This is an example of use a pandas.Series
to do the splitting and getting back an array.
import pandas as pd
np.array(pd.Series(desc).str.split().tolist())
array([['Alu', 'Bokhara'],
['Kurma', 'Polao']],
dtype='<U7')
For just the first word
pd.Series(desc).str.split().str[0].values
array(['Alu', 'Kurma'], dtype=object)
Upvotes: 1