Mikko Marttila
Mikko Marttila

Reputation: 11878

Recreate vector from print() console output

Regrettably often you see questions on SO that present data in a format that's not reproducible; often just the copied result of print() ...

set.seed(1)

x <- sample(LETTERS, 40, replace = T)
y <- rnorm(20)

... such as this:

x
 [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"

... or this:

y
 [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575

Ideally I'd like to be able to copy, for example, the text from the chunk above to my clipboard, and call some function foo() such that all.equal(foo(), x) for discrete data types, and all(near(foo(), y)) for floats (given the printed accuracy).

Is there an easy way to (approximately) reconstruct a simple vector from the copied result of print()'ing it?


Edit: Ironically, I realized that my own example wasn't exactly fully reproducible. Here's the code to create the copied print output:

y_printed <- capture.output(y)

Upvotes: 5

Views: 532

Answers (3)

Mikko Marttila
Mikko Marttila

Reputation: 11878

For my use, I ended up modifying @RuiBarradas' answer a little bit to include some features I wanted: reading from the clipboard, and type guessing (with the help of readr).

rescue_vector <- function(x = readClipboard()) {
  x <- gsub("(^|\n)\\s*\\[\\d+\\]", "", x)
  x <- scan(text = x, what = character(),
            allowEscapes = TRUE, quiet = TRUE)
  readr::parse_guess(x, na = character())
}

It works on the given the example data:

set.seed(1)

x <- sample(LETTERS, 40, replace = TRUE)
all.equal(x, rescue_vector(capture.output(x)))
#> [1] TRUE

y <- rnorm(20)
all.equal(y, rescue_vector(capture.output(y)))
#> [1] TRUE

And reads from the clipboard:

writeClipboard(capture.output(y))
all.equal(y, rescue_vector())
#> [1] TRUE

And also some strange cases:

z <- c("[1] first \n second", "[2] + 1")
all.equal(z, rescue_vector(capture.output(z)))
#> [1] TRUE

But missing values remain an issue:

na <- c("", "NA", NA)
rescue_vector(capture.output(na))
#> [1] "" NA NA

As @Moody_Mudskipper mentioned in the comments, further developments could include rescue attempts for pasted tables, too.

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76402

I use scan for that problem.

Can you make a function out of the below code?

y <-
  '[1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575'

y <- scan(what = character(), text = y)
y <- sub("^\\s*\\[\\d+\\]", "", y)
y <- as.numeric(y[y != ""])

With the suggestion in the comment by @Moody_Mudskipper,

Pattern can be updated to "^\s*\[\d+\]" to support OP's example (which starts by a space).

a function could be

recreateVector <- function(X, numeric = TRUE, quiet = FALSE){
  X <- scan(what = character(), text = X, quiet = quiet)
  X <- sub("^\\s*\\[\\d+\\]", "", X)
  X <- X[X != ""]
  if(numeric) X <- as.numeric(X)
  X
}


recreateVector(y)   # Use the original y
#Read 24 items
# [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
# [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
#[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
#[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575

With a character vector, set argument numeric = FALSE, the default is TRUE.

x <-
'[1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"'

recreateVector(x, numeric = FALSE)
#Read 43 items
# [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U"
#[16] "M" "S" "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I"
#[31] "M" "P" "M" "E" "V" "R" "U" "C" "S" "K"

Note the argument quiet. I have set the default to FALSE, like in the definition of scan because I prefer to see whether anything was actually read in.

Upvotes: 2

Nicolas2
Nicolas2

Reputation: 2210

We can mimic the guess on data type done when reading CSV files:

library(tidyverse)
unprint <- function(s) {
  s %>% str_replace_all(" *\\[\\d+\\] *","") %>% str_replace_all(" +","\n") %>% 
  textConnection %>% read.table
}
unprint(' [1]  0.91897737  0.78213630  0.07456498 -1.98935170  0.61982575
 [6] -0.05612874 -0.15579551 -1.47075238 -0.47815006  0.41794156
[11]  1.35867955 -0.10278773  0.38767161 -0.05380504 -1.37705956
[16] -0.41499456 -0.39428995 -0.05931340  1.10002537  0.76317575') %>% head

#           V1
#1  0.91897737
#2  0.78213630
#3  0.07456498
#4 -1.98935170
#5  0.61982575
#6 -0.05612874


unprint(' [1] "G" "J" "O" "X" "F" "X" "Y" "R" "Q" "B" "F" "E" "R" "J" "U" "M" "S"
[18] "Z" "J" "U" "Y" "F" "Q" "D" "G" "K" "A" "J" "W" "I" "M" "P" "M" "E"
[35] "V" "R" "U" "C" "S" "K"') %>% head

#  V1
#1  G
#2  J
#3  O
#4  X
#5  F
#6  X

A more elaborated version to handle brackets in strings : Also gives the correct output : a vector, not a data frame.

unprint <- function(s) {
  t <- s %>% textConnection %>% readLines %>% 
    str_replace(" *\\[\\d+\\] *","") %>%
    paste(collapse=' ') %>% str_replace_all(" ","\n") %>% 
    textConnection %>% read.table(stringsAsFactors=FALSE) 
  t$V1 %>% str_replace_all("\n"," ")
}

x <- unprint(' [1] "x + y  [1]" "x + z  [2]"')
x
#[1] "x + y  [1]" "x + z  [2]"

Upvotes: 2

Related Questions