silent_hunter
silent_hunter

Reputation: 2508

Grep text and put in data frame

As result of the checkresiduals() function from the forecast package I got this result:

#Result of checkresiduals() function
test <- "Q* = 4.5113, df = 4.6, p-value = 0.4237"

Now, my intention is to split this line of text with grep() or other functions into a data.frame (with three columns Q*, df, p-value), like in the example below:

    Q*         df       p-value
    4.5113     4.6      0.4237

Can anyone help me with this code?

Upvotes: 2

Views: 169

Answers (3)

Uwe
Uwe

Reputation: 42544

Here are two alternative approaches:

  1. Convert string into DCF format and use read.dcf()
  2. "Computing on the language": Convert string into a valid R expression and use parse() / eval()

read.dcf()

Use the read.dcf() function after the string test is converted into DCF (Debian Control File) format.
(BTW, the DESCRIPTION file of each R package is in DCF format.)

library(magrittr) # piping used for readability
test %>% 
  stringr::str_replace_all("=", ":") %>%        # replace "=" by ":"
  stringr::str_replace_all(",\\s*", "\n") %>%   # replace ", " by line break
  textConnection() %>% 
  read.dcf(all = TRUE)
     Q*  df  p-value 
1 4.5113 4.6   0.4237

All columns are of type character.

Computing on the language

library(magrittr) # piping used for readability
test %>%   
  stringr::str_replace_all("(\\S+) =", "`\\1` =") %>% 
  paste0("data.frame(", ., ", check.names = FALSE)") %>% 
  parse(text = .) %>% 
  eval()
      Q*  df p-value
1 4.5113 4.6  0.4237

All columns are of type double.

test %>%   
  stringr::str_replace_all("(\\S+) =", "`\\1` =") %>%   
  paste0("data.frame(", ., ", check.names = FALSE)")

returns

"data.frame(`Q*` = 4.5113, `df` = 4.6, `p-value` = 0.4237, check.names = FALSE)"

which is then parsed into an expression and evaluated.

Note that all variable names are quoted to handle syntactically invalid variable names like Q* and p-value.

Upvotes: 1

akrun
akrun

Reputation: 887078

Here is one way with tidyverse

library(tidyverse)
tibble(test) %>% 
    separate_rows(test, sep = ",\\s*") %>% 
    separate(test, into = c("v1", 'v2'), sep= " = ") %>% 
    deframe %>%
    as.list %>% 
    as_tibble
# A tibble: 1 x 3
#  `Q*`   df    `p-value`
#  <chr>  <chr> <chr>    
#1 4.5113 4.6   0.4237   

Also, it can modified into JSON and read it easily with jsonlite

library(jsonlite)
data.frame(fromJSON(paste0("{", gsub('([^0-9, ]+)(?: \\=)', '"\\1":', 
               test), "}")), check.names = FALSE) 
#       Q*  df p-value
#1 4.5113 4.6  0.4237

Upvotes: 2

jay.sf
jay.sf

Reputation: 72758

You could use strsplit.

tmp <- do.call(cbind, strsplit(strsplit(test, ", ")[[1]], " = "))
d <- setNames(data.frame(t(as.numeric(tmp[2, ]))), tmp[1, ])
#       Q*  df p-value
# 1 4.5113 4.6  0.4237

Upvotes: 3

Related Questions