Eric H
Eric H

Reputation: 1353

Elixir characters being treated as integers, not characters when splitting to head and tail

I'm working on a basic problem with elixir - RNA transcription. However I'm hitting some unexpected (to me) behavior with my solution:

defmodule RnaTranscription do
  @doc """
  Transcribes a character list representing DNA nucleotides to RNA

  ## Examples

  iex> RnaTranscription.to_rna('ACTG')
  'UGAC'
  """
  @spec to_rna([char]) :: [char]
  def to_rna(dna) do
    _to_rna(dna)
  end

  def _to_rna([]), do: ''
  def _to_rna([head | tail]), do: [_rna(head) | _to_rna(tail)]

  def _rna(x) when x == 'A', do: 'U' 
  def _rna(x) when x == 'C', do: 'G'
  def _rna(x) when x == 'T', do: 'A'
  def _rna(x) when x == 'G', do: 'C'
end

When the solution is run, I get errors as the _rna function is being invoked with an integer that does not match the guard clause instead of the character.

The following arguments were given to RnaTranscription._rna/1:

        # 1
        65

    lib/rna_transcription.ex:18: RnaTranscription._rna/1
    lib/rna_transcription.ex:16: RnaTranscription._to_rna/1

Is there a way to force elixir to keep the value as a character when it splits into head and tail?

Upvotes: 0

Views: 103

Answers (2)

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

Besides the difference between a list 'A' and a character ?A perfectly answered by Michael, this code has one more hidden but important glitch.

You use recursion that is not tail-optimized, which should be avoided at any cost. It might in general lead to stack overflow. Below is the TCO code.

defmodule RnaTranscription do
  def to_rna(dna), do: do_to_rna(dna)

  defp do_to_rna(acc \\ [], []), do: Enum.reverse(acc)
  defp do_to_rna(acc, [char | tail]),
    do: do_to_rna([do_char(char) | acc], tail)

  defp do_char(?A), do: ?U 
  defp do_char(?C), do: ?G
  defp do_char(?T), do: ?A
  defp do_char(?G), do: ?C
end

RnaTranscription.to_rna('ACTG')
#⇒ 'UGAC'

or, even better, with a comprehension

converter = fn
  ?A -> ?U 
  ?C -> ?G
  ?T -> ?A
  ?G -> ?C
end

for c <- 'ACTG', do: converter.(c)         
#⇒ 'UGAC'

You might even filter it inplace.

for c when c in 'ACTG' <- 'ACXXTGXX',
  do: converter.(c)         
#⇒ 'UGAC'

Upvotes: 1

Michael Smith
Michael Smith

Reputation: 41

You can use the ? code point operator:

  def _rna(x) when x == ?A, do: 'U'
  def _rna(x) when x == ?C, do: 'G'
  def _rna(x) when x == ?T, do: 'A'
  def _rna(x) when x == ?G, do: 'C'

Strictly speaking, Elixir is already keeping the value as a character! A character is a code point, which is an integer. When you match on 'A', you are matching on a charlist, which is a list of integers. That is, you are trying to match 65 to [65].

Upvotes: 1

Related Questions