Reputation: 13
I have lines that look like this
01:04:43.064 [12439] <2> xyz
01:04:43.067 [12439] <2> a lmn
01:04:43.068 [12439] <4> j klm
x_times_wait to <3000>
01:04:43.068 [12439] <4> j klm
enter_object <5000> main k
I want a regex to extract only the values after the angular brackets for lines that start with a timestamp
This is what I have tried - assuming that these lines are in a data frame called nn
split<-str_split_fixed(nn[,1], ">", 2)
split2<-data.frame(split[,2])
The problem is that split2 gives
xyz
a lmn
j klm
j klm
main k
How can I make sure that the empty line and main k is not returned?
Upvotes: 0
Views: 558
Reputation: 4767
Using rex may make this type of task a little simpler.
string <- "01:04:43.064 [12439] <2> xyz
01:04:43.067 [12439] <2> a lmn
01:04:43.068 [12439] <4> j klm
x_times_wait to <3000>
01:04:43.068 [12439] <4> j klm
enter_object <5000> main k"
library(rex)
timestamp <- rex(n(digit, 2), ":", n(digit, 2), ":", n(digit, 2), ".", n(digit, 3))
re <- rex(timestamp, space,
"[", digits, "]", space,
"<", digits, ">", space,
capture(anything))
re_matches(string, re, global = TRUE)
#> [[1]]
#> 1
#> 1 xyz
#> 2 a lmn
#> 3 j klm
#> 4 j klm
Upvotes: 0
Reputation: 99331
If a timestamp is defined as 1 or more digits followed by a :
, followed by 1 or more digits and another :
and then 1 or more digits, then perhaps this method would work for you.
x <- c("01:04:43.064 [12439] <2> xyz", "01:04:43.067 [12439] <2> a lmn",
"01:04:43.068 [12439] <4> j klm", "x_times_wait to <3000>",
"01:04:43.068 [12439] <4> j klm", "enter_object <5000> main k")
sub(".*> ", "", x[grepl("\\d+:\\d+:\\d+", x)])
# [1] "xyz" "a lmn" "j klm" "j klm"
This removes all the non-timestamp elements first, then gets the values after >
with the remaining elements.
Upvotes: 2
Reputation: 81693
Here's an approach in base R:
The regex:
^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+
You can use it with gregexpr
:
unlist(regmatches(vec, gregexpr("^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+",
vec, perl = TRUE)))
# [1] "xyz" "a lmn" "j klm" "j klm"
where vec
is the vector containing your strings.
Upvotes: 0
Reputation: 67968
\d+(?::\d+){2}\.\d+\s+\[[^\]]+\]\s+<\d+>(.+)$
Instead of split try match and grab the group 1.See demo.
https://regex101.com/r/vN3sH3/16
or
Split by (?<=<\d>)
and get split2
Upvotes: 3