Reputation: 5682
Suppose I have the following text:
" 7.7597 4.7389 3.0058 0.0013"
I know it's format:
" %9.4f %9.4f %9.4f %9.4f"
I want to extract variables out of it. I want something like sprintf
/gettextf
function but does does the reverse:
??????(" %9.4f %9.4f %9.4f %9.4f", v1, v2, v3, v4)
How can I do that? (without loading any packages, if possible)
The unreliable method I use right now is:
temp <- as.numeric(unlist(strsplit(" 7.7597 4.7389 3.0058 0.0013"," ")))
temp[!is.na(temp)]
Upvotes: 3
Views: 560
Reputation: 11110
I would do:
scan(text=" 7.7597 4.7389 3.0058 0.0013")
#Read 4 items
#[1] 7.7597 4.7389 3.0058 0.0013
It correctly reports NA
s:
scan(text=" 7.7597 NA 4.7389 3.0058 0.0013")
#Read 5 items
#[1] 7.7597 NA 4.7389 3.0058 0.0013
It breaks on malformed input (non-numeric). So you can control it with a tryCatch
:
tryCatch(scan(text=" abc 7.7597 4.7389"), error= function(e) cat("Malformed input\n"))
#Malformed input
Under the hood
How come that scan
gets the floats properly? The function has an argument, what
, to set the data type you are scanning for. The default parameter is
scan(..., what=double())
So it parses pretty well the floats required in the question. Anyway, should you change your needs and looking for different data types, try:
scan(text=" 7 4 3 0 ", what=integer())
#Read 4 items
#[1] 7 4 3 0
As usual you can check for data consistency:
tryCatch(scan(text=" 1 2.3", what=integer()), error= function(e) cat("Non-integer value(s) passed!\n"))
#Non-integer value(s) passed!
Upvotes: 1
Reputation: 99331
Why not make your method more reliable, instead of searching for something that may not even exist.
> x <- " 7.7597 4.7389 3.0058 0.0013"
> unlist(read.table(text = x, strip.white = TRUE), use.names = FALSE)
# [1] 7.7597 4.7389 3.0058 0.0013
> as.numeric(sapply(strsplit(x, "\\s+"), "[", -1))
# [1] 7.7597 4.7389 3.0058 0.0013
> as.numeric(strsplit(x, "\\s+")[[1]])[-1]
# [1] 7.7597 4.7389 3.0058 0.0013
> library(stringr)
> as.numeric(strsplit(str_trim(x), "\\s+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013
> as.numeric(str_extract_all(x, "[0-9][.][0-9]+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013
Upvotes: 0