Reputation: 616
I am trying to build a regex expression to extract a 6 digit number (positive or negative) after a certain string, namely 'LogL='.
It comes from text output from certain software.
7 LogL=-3695.47 S2= 9.0808 1891 df 2.263 0.2565
9 LogL= 2456.30 S2= 1.2789 1785 df 1.244 0.1354
I tried the following in R:
txt <- " 9 LogL= 2456.30 S2= 1.2789 1785 df 1.244 0.1354 "
as.numeric(unlist(strsplit(sub(".*LogL=*", "", txt), " "))[1])
Doesn't work for positive numbers. And I imagine its very crude/ugly way of going about it. I tried meddling on regex101.com
Stackoverflow related questions tried: (1) (2) (3)
I am kind of lost and can't seem to understand regex expressions. I am sure this is a piece of cake. Help?
Upvotes: 4
Views: 14556
Reputation: 887981
We can use str_extract
library(stringr)
as.numeric(str_extract_all(txt, "(?<=LogL=\\s{0,1})[-0-9.]+")[[1]])
#[1] -3695.47 2456.30
Or we can use a combination of strsplit
and gsub
as.numeric(gsub(".*LogL=\\s*|\\s+.*", "", trimws(strsplit(txt, "\n")[[1]])))
#[1] -3695.47 2456.30
Upvotes: 3
Reputation: 132999
I'd use a look-behind regex:
txt <- " 7 LogL=-3695.47 S2= 9.0808 1891 df 2.263 0.2565
9 LogL= 2456.30 S2= 1.2789 1785 df 1.244 0.1354 "
pattern <- "(?<=LogL\\=)\\s*\\-*[0-9.]+"
m <- gregexpr(pattern, txt, perl = TRUE)
as.numeric(unlist(regmatches(txt, m)))
#1] -3695.47 2456.30
Upvotes: 7
Reputation: 1751
If you can be interested in a non regex alternative:
library(stringr)
txt <- " 9 LogL= 2456.30 S2= 1.2789 1785 df 1.244 0.1354 "
word(txt, 2, sep = "=") %>% word(2, sep = " ")
It works with positive and negative numbers.
Upvotes: 3
Reputation: 8342
Try
LogL=\s*(-?\d+(?:\.\d+)?)
It matches your text (LogL), an equal sign followed by any number of spaces. Then it captures:
-
.
followed by at least one digit.Upvotes: 6