qertoip
qertoip

Reputation: 1880

How to find index of a substring?

Looking for Elixir equivalent of Ruby's:

"[email protected]".index("@")         # => 9
"[email protected]".index("domain")    # => 10

Upvotes: 16

Views: 11452

Answers (5)

Jose.m.g
Jose.m.g

Reputation: 11

# index (as INSTR from basic...)

...
 import IO, except: [inspect: 1]  
 puts index "algopara ver", "ver" 

 def index( mainstring, searchstring) do 
   tuple = (:binary.match mainstring, searchstring)
   if tuple === :nomatch do 
      0 
   else 
      elem(tuple,0) 
   end  
 end 
... 

9

Upvotes: 1

qertoip
qertoip

Reputation: 1880

TL;DR: String.index/2 is intentionally missing because smarter alternatives exist. Very often String.split/2 will solve the underlying problem - and with a way better performance.

  • I assume we are talking UTF-8 strings here and expect to cleanly deal with non-ASCII characters.

  • Elixir encourages fast code. It turns out that problems we usually try solve with String.index/2 can be solved in a much smarter way, vastly improving performance without degrading code readability.

  • The smarter solution is to use String.split/2 and/or other similar String module functions. The String.split/2 works on a byte-level while still correctly handling graphemes. It can't go wrong because both arguments are Strings! The String.index/2 would have to work on a grapheme-level, slowly seeking throughout the String.

  • For that reason the String.index/2 is unlikely be added to the language unless very compelling use cases come up that cannot be cleanly solved by existing functions.

  • See also the elixir-lang-core discussion on that matter: https://groups.google.com/forum/#!topic/elixir-lang-core/S0yrDxlJCss

  • On a side note, Elixir is pretty unique in its mature Unicode support. While most languages work on a codepoint level (colloquially "characters"), Elixir works with a higher level concept of graphemes. Graphemes are what users perceive as one character (lets say its a more practical understanding of a "character"). Graphemes can contain more than one codepoint (which in turn can contain more than one byte).

Finally, if we really need the index:

case String.split("[email protected]", "domain", parts: 2) do
  [left, _] -> String.length(left)
  [_] -> nil
end

Upvotes: 18

Paweł Obrok
Paweł Obrok

Reputation: 23164

You can use Regex.run/3 and pass it return: :index as an option:

iex(5)> [{start, len}] = Regex.run(~r/abc/, " abc ", return: :index)
[{1, 3}]

Upvotes: 7

Gazler
Gazler

Reputation: 84150

You can get the byte index using :binary.match/3

{index, length} = :binary.match("aéiou", "o")    
{4, 1}

If you want the location in the string then use:

"aéiou" |> to_char_list() |> Enum.find_index(&(&1 == ?o))
3

The String module documentation explains the difference between byte length and string length.

Upvotes: 5

Dogbert
Dogbert

Reputation: 222128

I don't think there's any Elixir wrapper for this, see #1119.

You can call :binary.match directly until then:

iex(1)> :binary.match "[email protected]", "@"
{9, 1}
iex(2)> :binary.match "[email protected]", "domain"
{10, 6}

The return value is a tuple containing the index and the length of the match. You can extract just the index by piping into |> elem(0) or using pattern matching.

Note that :binary.match returns :nomatch if the substring isn't found in the string.

Upvotes: 21

Related Questions