Reputation: 77
I’m working with XML embedded in a syslog message. I used Python to remove the information outside of the <>
. Since I’m playing with Julia I’m trying to figure out a way of doing the same thing. I’ve read about findfirst, but that doesn’t resolve the issue. This is sample data.
Datetime host other stuff <xml data and more data>stuff at the end
What I want is just the data between <>
. In Python I use
print(line[line.find(“<“):line.find(“>”)])
Is there anything similar in Julia?
TIA Joe
Upvotes: 3
Views: 403
Reputation: 69819
Alternatively you can use regular a expression:
julia> str = "Datetime host other stuff <xml data and more data>stuff at the end"
"Datetime host other stuff <xml data and more data>stuff at the end"
julia> rx = r"<(.*?)>"
r"<(.*?)>"
julia> match(rx, str)[1]
"xml data and more data"
If you wanted to use the approach that Oscar proposes then the correct syntax would be:
julia> chop(str[findfirst('<',str):findfirst('>',str)], head=1, tail=1)
"xml data and more data"
Finally note that in Python your code does not give you what you want as it produces:
>>> line = "Datetime host other stuff <xml data and more data>stuff at the end"
>>> print(line[line.find("<"):line.find(">")])
<xml data and more data
and as you can see <
character is not stripped from the string as you wanted.
Upvotes: 4
Reputation: 42194
Since it is log processing perhaps the performance is somewhat important.
In that case use SubString{String}
(which does not make memory copying).
Moreover you probably want to use findlast
when searching for '>'
.
SubString(line, findfirst('<', line), findlast('>',line))
This is non-copying and returns a SubString{String}
object.
Upvotes: 3
Reputation: 6378
If you check the docs for findfirst, it will give you the correct usage. In this case, what you want is println(line[findfirst(line, “<“):findfirst(line, “>”)])
Upvotes: 2