Rohan Nagalkar
Rohan Nagalkar

Reputation: 443

Numpy : Get the element of array which contains True on comparions

import numpy as np  
import re

list1= ['651ac1', '21581', '13737|14047', '22262', '12281', '12226', '61415', '61495']
regexp = '[a-zA-Z]'
selection = np.array([bool(re.search(regexp, element)) for element in list1])
if True in selection:
    #get_element_containing_true

The selection looks like this:

selection
array([ True, False, False, False, False, False, False, False, False], dtype=bool)

I want to get the element of array which contains True. How do I get this?

Upvotes: 2

Views: 707

Answers (2)

Divakar
Divakar

Reputation: 221674

You could directly get those within the list-comprehension -

[element for element in list1 if bool(re.search(regexp, element))]

On a closer look, With the search method, we get an object for a match :

In [175]: re.search(regexp, list1[0])
Out[175]: <_sre.SRE_Match at 0x7fc30bac1c60>

For a no match case, we get None.

According to the definitions of booleans :

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true. User-defined objects can customize their truth value by providing a bool() method.

So, if the search method results are directly fed to IF, we get objects for matches and None for no matches. As such using the definition, with the IF, a match would be evaluated as True and False otherwise. Thus, we can skip the bool() there and have a simplified version, like so -

[element for element in list1 if re.search(regexp, element)]

Upvotes: 2

MSeifert
MSeifert

Reputation: 152775

Do you actually need numpy here (see @Divakar's answer if you don't)? If you do, you could convert the list1 to np.array and index:

np.array(list1)[selection]

This is called boolean array indexing. Just in case you're interested.


Just a performance tip: If you use a regular expression several times: compile it and reuse that compiled one:

regexp = re.compile('[a-zA-Z]')
selection = np.array([bool(regexp.search(element)) for element in list1])

That could be much faster and easily combined with the other answer.

Upvotes: 3

Related Questions