tstev
tstev

Reputation: 616

Extract number after a certain word

I am trying to build a regex expression to extract a 6 digit number (positive or negative) after a certain string, namely 'LogL='.

It comes from text output from certain software.

   7 LogL=-3695.47     S2=  9.0808       1891 df    2.263     0.2565    
   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354    

I tried the following in R:

txt <- "   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
as.numeric(unlist(strsplit(sub(".*LogL=*", "", txt), " "))[1])

Doesn't work for positive numbers. And I imagine its very crude/ugly way of going about it. I tried meddling on regex101.com

Stackoverflow related questions tried: (1) (2) (3)

I am kind of lost and can't seem to understand regex expressions. I am sure this is a piece of cake. Help?

Upvotes: 4

Views: 14556

Answers (4)

akrun
akrun

Reputation: 887981

We can use str_extract

 library(stringr)
 as.numeric(str_extract_all(txt, "(?<=LogL=\\s{0,1})[-0-9.]+")[[1]])
 #[1] -3695.47  2456.30

Or we can use a combination of strsplit and gsub

as.numeric(gsub(".*LogL=\\s*|\\s+.*", "", trimws(strsplit(txt, "\n")[[1]])))
#[1] -3695.47  2456.30

Upvotes: 3

Roland
Roland

Reputation: 132999

I'd use a look-behind regex:

txt <- "   7 LogL=-3695.47     S2=  9.0808       1891 df    2.263     0.2565    
           9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
pattern <- "(?<=LogL\\=)\\s*\\-*[0-9.]+"
m <- gregexpr(pattern, txt, perl = TRUE)
as.numeric(unlist(regmatches(txt, m)))
#1] -3695.47  2456.30

Upvotes: 7

thepule
thepule

Reputation: 1751

If you can be interested in a non regex alternative:

library(stringr)
txt <- "   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
word(txt, 2, sep = "=") %>% word(2, sep = " ")

It works with positive and negative numbers.

Upvotes: 3

SamWhan
SamWhan

Reputation: 8342

Try

LogL=\s*(-?\d+(?:\.\d+)?)

It matches your text (LogL), an equal sign followed by any number of spaces. Then it captures:

  • an optional -
  • digits, at least one
  • and optionally, a . followed by at least one digit.

Check it here at regex101.

Upvotes: 6

Related Questions