HBat
HBat

Reputation: 5682

Reading text using a function like sprintf in R

Suppose I have the following text:

"    7.7597     4.7389     3.0058     0.0013"

I know it's format:

" %9.4f  %9.4f  %9.4f  %9.4f"

I want to extract variables out of it. I want something like sprintf/gettextf function but does does the reverse:

??????(" %9.4f  %9.4f  %9.4f  %9.4f", v1, v2, v3, v4)

How can I do that? (without loading any packages, if possible)

The unreliable method I use right now is:

temp <- as.numeric(unlist(strsplit("    7.7597     4.7389     3.0058     0.0013"," ")))
temp[!is.na(temp)]

Upvotes: 3

Views: 560

Answers (2)

antonio
antonio

Reputation: 11110

I would do:

scan(text="  7.7597     4.7389     3.0058     0.0013")
#Read 4 items
#[1] 7.7597 4.7389 3.0058 0.0013

It correctly reports NAs:

scan(text="   7.7597  NA   4.7389     3.0058     0.0013")
#Read 5 items
#[1] 7.7597     NA 4.7389 3.0058 0.0013

It breaks on malformed input (non-numeric). So you can control it with a tryCatch:

tryCatch(scan(text=" abc  7.7597  4.7389"), error= function(e) cat("Malformed input\n")) 
#Malformed input 

Under the hood

How come that scan gets the floats properly? The function has an argument, what, to set the data type you are scanning for. The default parameter is

scan(...,  what=double())

So it parses pretty well the floats required in the question. Anyway, should you change your needs and looking for different data types, try:

scan(text="  7  4  3  0 ", what=integer())
#Read 4 items
#[1] 7 4 3 0

As usual you can check for data consistency:

tryCatch(scan(text=" 1 2.3", what=integer()), error= function(e) cat("Non-integer value(s) passed!\n")) 
#Non-integer value(s) passed!

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

Why not make your method more reliable, instead of searching for something that may not even exist.

> x <- "    7.7597     4.7389     3.0058     0.0013"

> unlist(read.table(text = x, strip.white = TRUE), use.names = FALSE)
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(sapply(strsplit(x, "\\s+"), "[", -1))
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(strsplit(x, "\\s+")[[1]])[-1]
# [1] 7.7597 4.7389 3.0058 0.0013

> library(stringr)
> as.numeric(strsplit(str_trim(x), "\\s+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013

> as.numeric(str_extract_all(x, "[0-9][.][0-9]+")[[1]])
# [1] 7.7597 4.7389 3.0058 0.0013

Upvotes: 0

Related Questions