Mohammad Nazari
Mohammad Nazari

Reputation: 3025

How do I get a substring of a string in Julia?

Is there a way in Julia that gets from one particular character to another? For example, I want to get the variable s="Hello, world" of 3 to 9 characters.

# output = 'llo, wo'

Upvotes: 11

Views: 6318

Answers (3)

gwr
gwr

Reputation: 467

As was noted by Bogumił Kamiński, Julia uses byte indexing, which kind of gets in the way when one wants to have something similar to the behavior I get in Wolfram Mathematica:

StringTake["Milchmädchen", {8, 12}]
(* "dchen" *)

Coming from a high-level language like the Wolfram Language, the behavior in Julia—even for a higher level function like SubString is confusing:

julia> SubString("Milchmädchen", 9, 13)
"dchen"
    
julia> length("Milchmädchen")
12

So, an immediately reasonable approach might be to work with a collection of characters and concatenate the result of any extractions:

"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
    characters = collect(s)
    ind_i, ind_j = max(1, i), min(j, length(s))
    return join(characters[ind_i:ind_j])
end

While this is straight forward, it may be expensive for large strings as we needed to create an array of all characters. Prof. Kaminski has shown other approaches, e.g., using indexing functions. As of v1.9 (thanks @DNF for pointing this out), we may use the graphemes function from the Unicode module in the Julia Standard Library, which will iterate over graphemes in any string:

import Unicode

"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
    ind_i, ind_j = max(i, 1), min(j, length(s))
    return Unicode.graphemes(s, ind_i:ind_j)
end

With one of these implementations in place we can do the following in a REPL:

julia> stringtake("Milchmädchen", 8, 12)
"dchen"

julia> stringtake("Hello, world", 3, 9)
"llo, wo"

julia> stringtake("😄 Hello! 👋", 3)
"Hello! 👋"

Upvotes: 0

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

The other solution is working for ASCII only strings. However, Julia uses byte indexing not character indexing in getindex syntax, as I have discussed on my blog some time ago. If you want to use character indexing (which I assume you do from the wording of your question) here you have a solution using macros.

In general (without using the solution linked above) the functions to use are: chop, first, last, or for index manipulation prevind, nextind, and length.

So e.g. to get characters from 3 to 9 a safe syntaxes are e.g. (just showing several combinations)

julia> str = "😄 Hello! 👋"
"😄 Hello! 👋"

julia> last(first(str, 9), 7)
"Hello! "

julia> chop(str, head=2, tail=length(str)-9)
"Hello! "

julia> chop(first(str, 9), head=2, tail=0)
"Hello! "

julia> str[(:)(nextind.(str, 0, (3, 9))...)]
"Hello! "

Note though that the following is incorrect:

julia> str[3:9]
ERROR: StringIndexError: invalid index [3], valid nearby indices [1]=>'😄', [5]=>' '

There is an open issue to make chop more flexible which would simplify your specific indexing case.

Upvotes: 14

Mohammad Nazari
Mohammad Nazari

Reputation: 3025

You can use the following method:

s="Hello, world"

s[3:9]
# output: llo, wo

s[3:end]
# output: llo, world

Upvotes: 1

Related Questions