J. Mini
J. Mini

Reputation: 1610

Is there a difference between selecting an object and selecting object[,]?

Take some big txt file. Say, this one and read it in:

loc<-[file location]
txt<-read.delim(loc, header = FALSE,stringsAsFactors=FALSE)

If we paste it all together like this, we get a completely sensible output (I've only shown a bit of it):

> paste0(txt[,],collapse = "")
[1] "                    GNU GENERAL PUBLIC LICENSE                       Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies"

But if rather than using txt[,], we just use txt, we get a vector output that's got a bunch of backslashes (again, I've truncated).

> paste0(txt,collapse = "")
[1] "c(\"                    GNU GENERAL PUBLIC LICENSE\", \"                       Version 2, June 1991\", \" Copyright (C) 1989, 1991 Free Software Foundation, Inc.,\", \" 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA\", \" Everyone is permitted to copy and distribute verbatim copies\"

This implies that there's a difference between txt and txt[,]. But what is it?

Upvotes: 2

Views: 32

Answers (1)

duckmayr
duckmayr

Reputation: 16910

This is because here txt is a dataframe of one variable, which is also a list of length one, whereas txt[,] is just a vector (if and only if txt has only one variable, and that variable is a vector). When you paste() a list, it gives you that representation of the objects in each element.

I will give a little smaller example to demonstrate:

dat <- data.frame(x = letters[1:3])
paste0(dat[,], collapse = "")
# [1] "abc"
paste0(dat, collapse = "")
# [1] "c(\"a\", \"b\", \"c\")"

Those backslashes are just escaping the internal quotation marks:

cat(paste0(dat, collapse = ""))
# c("a", "b", "c")

Now consider what happens if the dataframe had a second variable:

dat <- data.frame(x = letters[1:3], y = LETTERS[1:3])
paste0(dat[,], collapse = "")
# [1] "c(\"a\", \"b\", \"c\")c(\"A\", \"B\", \"C\")"
paste0(dat, collapse = "")
# [1] "c(\"a\", \"b\", \"c\")c(\"A\", \"B\", \"C\")"

Now we can see what is going on. When a dataframe has only one variable, dat[,] will return a vector, while if it has more than one, it still returns a list (a dataframe is also a list):

dat <- data.frame(x = letters[1:3])
str(dat[,])
#  chr [1:3] "a" "b" "c"
dat <- data.frame(x = letters[1:3], y = LETTERS[1:3])
str(dat[,])
# 'data.frame': 3 obs. of  2 variables:
#  $ x: chr  "a" "b" "c"
#  $ y: chr  "A" "B" "C"

Another example to show this is general list paste behavior:

l <- list(1:3)
l
# [[1]]
# [1] 1 2 3
paste(l)
# [1] "1:3"
paste(l[[1]])
# [1] "1" "2" "3"

Upvotes: 1

Related Questions