jay
jay

Reputation: 1463

Get indices in pandas series while using str.findall

I am working on finding the rows which contain a particular string. the dataset has close to 1 million rows. Here is a simple example;

text=['abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111www','anontalk.com']
text=pd.Series(text)
srhc=text.str.findall('www')
srhc

And the output is;

0    [www, www]
1    []        
dtype: object

Is it possible to efficiently (i.e. programmatically) just obtain the list of indices, which contain the text www. Help is appreciated.

Upvotes: 0

Views: 699

Answers (4)

sammywemmy
sammywemmy

Reputation: 28644

I think it is more efficient to do a list comprehension to get your indexes, especially since there is nothing unique or special about the index of the series

text=['abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111www','anontalk.com']

#I use this to stay true to your question
text=pd.Series(text)

#this gets you the index/indices
#which is what you want, based on your question
[index for index, entry in enumerate(text) if 'www' in entry]

[0]

Upvotes: 0

Sy Ker
Sy Ker

Reputation: 2180

To search for a specific sub-string use .str.contains() ;

text = ['abc [email protected]', 'helowww', '123 any@www', 'foo www', '@5555 aa@111www', 'anontalk.com']

text = pd.Series(text)

text[text.str.contains('www')]

Output;

1            helowww
2        123 any@www
3            foo www
4    @5555 aa@111www
dtype: object

To get the index of these;

text[text.str.contains('www')].index.to_list()

# or this 
text.index[text.str.contains('www')]

Ouput;

[1, 2, 3, 4]

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can filter text.index with str.contains():

srhc = text.index[text.str.contains('www')]
print(srhc)

Prints:

Int64Index([0], dtype='int64')

Upvotes: 1

BENY
BENY

Reputation: 323226

We can do str contains with nonzero

srhc=text.str.contains('www').to_numpy().nonzero()[0]
srhc
Out[66]: array([0], dtype=int64)

Upvotes: 1

Related Questions