Reputation: 1463
I am working on finding the rows which contain a particular string. the dataset has close to 1 million rows. Here is a simple example;
text=['abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111www','anontalk.com']
text=pd.Series(text)
srhc=text.str.findall('www')
srhc
And the output is;
0 [www, www]
1 []
dtype: object
Is it possible to efficiently (i.e. programmatically) just obtain the list of indices, which contain the text www
. Help is appreciated.
Upvotes: 0
Views: 699
Reputation: 28644
I think it is more efficient to do a list comprehension to get your indexes, especially since there is nothing unique or special about the index of the series
text=['abc [email protected] 123 any@www foo @ bar 78@ppp @5555 aa@111www','anontalk.com']
#I use this to stay true to your question
text=pd.Series(text)
#this gets you the index/indices
#which is what you want, based on your question
[index for index, entry in enumerate(text) if 'www' in entry]
[0]
Upvotes: 0
Reputation: 2180
To search for a specific sub-string use .str.contains()
;
text = ['abc [email protected]', 'helowww', '123 any@www', 'foo www', '@5555 aa@111www', 'anontalk.com']
text = pd.Series(text)
text[text.str.contains('www')]
Output;
1 helowww
2 123 any@www
3 foo www
4 @5555 aa@111www
dtype: object
To get the index of these;
text[text.str.contains('www')].index.to_list()
# or this
text.index[text.str.contains('www')]
Ouput;
[1, 2, 3, 4]
Upvotes: 0
Reputation: 195418
You can filter text.index
with str.contains()
:
srhc = text.index[text.str.contains('www')]
print(srhc)
Prints:
Int64Index([0], dtype='int64')
Upvotes: 1
Reputation: 323226
We can do str
contains
with nonzero
srhc=text.str.contains('www').to_numpy().nonzero()[0]
srhc
Out[66]: array([0], dtype=int64)
Upvotes: 1