Josep Espasa
Josep Espasa

Reputation: 759

Index of a pattern in a String vector with missing values in Julia

I am attempting to retrieve the index of all instances of a pattern in a String vector with missing values.

E.g. How can I get a vector with the index of all instances containing the pattern "a" from:

x = ["ab", "ca", "bc", missing, "ad"]

The desired outcome would be equal to:

Vector([1, 2, 5])
3-element Vector{Int64}:
 1
 2
 5

As these are the indexes in which the pattern appears.

Upvotes: 4

Views: 220

Answers (2)

Sundar R
Sundar R

Reputation: 14685

The findall version can be even neater:

julia> findall(contains("a"), skipmissing(x))
3-element Vector{Int64}:
 1
 2
 5

The first cool thing about this is that contains returns a curried version of itself, when supplied only the pattern to search for. So here, the contains("a") returns a function that searches for "a" inside any given String argument, and we pass that function on to findall as the predicate.

Even cooler is the way skipmissing works. As the name indicates, it skips missing values in its argument, but it doesn't just filter them out from x (which would change the indices of all values after the missings), but by providing an iterator that just jumps past the missing values. This means that the indices findall returns by iterating skipmissing(x) will be valid for x too, which is exactly what we want.

Upvotes: 1

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

A natural way to write it is:

julia> findall(v -> ismissing(v) ? false : contains(v, "a"), x)
3-element Vector{Int64}:
 1
 2
 5

Alternatively you could write:

julia> using Missings

julia> findall(coalesce.(passmissing(contains).(x, "a"), false))
3-element Vector{Int64}:
 1
 2
 5

which in this case is less readable, but in other contexts you might find passmissing and coalesce useful so I mention them.

Upvotes: 4

Related Questions