Joe Hughes
Joe Hughes

Reputation: 77

Pulling a substring in Julia

I’m working with XML embedded in a syslog message. I used Python to remove the information outside of the <>. Since I’m playing with Julia I’m trying to figure out a way of doing the same thing. I’ve read about findfirst, but that doesn’t resolve the issue. This is sample data.

Datetime host other stuff <xml data and more data>stuff at the end

What I want is just the data between <>. In Python I use

print(line[line.find(“<“):line.find(“>”)])

Is there anything similar in Julia?

TIA Joe

Upvotes: 3

Views: 403

Answers (3)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

Alternatively you can use regular a expression:

julia> str = "Datetime host other stuff <xml data and more data>stuff at the end"
"Datetime host other stuff <xml data and more data>stuff at the end"

julia> rx = r"<(.*?)>"
r"<(.*?)>"

julia> match(rx, str)[1]
"xml data and more data"

If you wanted to use the approach that Oscar proposes then the correct syntax would be:

julia> chop(str[findfirst('<',str):findfirst('>',str)], head=1, tail=1)
"xml data and more data"

Finally note that in Python your code does not give you what you want as it produces:

>>> line = "Datetime host other stuff <xml data and more data>stuff at the end"
>>> print(line[line.find("<"):line.find(">")])
<xml data and more data

and as you can see < character is not stripped from the string as you wanted.

Upvotes: 4

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42194

Since it is log processing perhaps the performance is somewhat important. In that case use SubString{String} (which does not make memory copying). Moreover you probably want to use findlast when searching for '>'.

SubString(line, findfirst('<', line), findlast('>',line))

This is non-copying and returns a SubString{String} object.

Upvotes: 3

Oscar Smith
Oscar Smith

Reputation: 6378

If you check the docs for findfirst, it will give you the correct usage. In this case, what you want is println(line[findfirst(line, “<“):findfirst(line, “>”)])

Upvotes: 2

Related Questions