sajis997
sajis997

Reputation: 1171

numpy split error - too many indices for array

I am trying to use the split function within numpy as follows:

desc = np.array(['Alu Bokhara','Kurma Polao'])

Then I am trying to extract and print the first word from each element within the array as follows:

np.array([np.split(i,' ')[0] for i in desc])

Then I am getting the error:

tuple index out of range

Any hint over this issue?

Thanks

Upvotes: 2

Views: 483

Answers (2)

Divakar
Divakar

Reputation: 221524

As an alternative vectorized approach, you could use np.core.defchararray.split -

[i[0] for i in np.core.defchararray.split(desc, sep=' ')]

Basically, we are splitting each element based on the space character, thus separating out words in a sub-list each and then simply selecting the first element from each sub-list.

Sample run -

In [117]: desc
Out[117]: 
array(['Alu Bokhara', 'Kurma Polao'], 
      dtype='|S11')

In [118]: [i[0] for i in np.core.defchararray.split(desc, sep=' ')]
Out[118]: ['Alu', 'Kurma']

Runtime test -

In [142]: desc = np.array(['Then I', 'am trying to' ,'extract and', 'print',\
     ...:     'the first word from each', 'element within the', 'array'])

In [143]: %timeit pd.Series(desc).str.split().str[0].values #@piRSquared's soln
1000 loops, best of 3: 509 µs per loop

In [144]: %timeit [i[0] for i in np.core.defchararray.split(desc, sep=' ')]
100000 loops, best of 3: 13.8 µs per loop

Upvotes: 2

piRSquared
piRSquared

Reputation: 294218

You could use pandas. This is an example of use a pandas.Series to do the splitting and getting back an array.

import pandas as pd

np.array(pd.Series(desc).str.split().tolist())

array([['Alu', 'Bokhara'],
       ['Kurma', 'Polao']], 
      dtype='<U7')

For just the first word

pd.Series(desc).str.split().str[0].values

array(['Alu', 'Kurma'], dtype=object)

Upvotes: 1

Related Questions