Hellblazer
Hellblazer

Reputation: 1

Can I use the unlist function in a dataframe?

I was working with a list containing the words from a text and the tags classifying them. I was supposed to restore an old letter, and to do this i needed to extract only the words in a vector, so instead of using sapply, i did this: words <- unlist(data.frame(letter)[1,], use.names = FALSE) It appeared to work, but the auxiliary professor said that doing this was a problem, since you can only use unlist in lists, so I fixed it, but in the end the results were the same. PS: I know that using sapply is more efficient, i just didn't remember the function, I'm just curious to know if you can use unlist in other objects

Upvotes: 0

Views: 273

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24790

As @Gregor notes, data.frames are lists. Consider the following example:

df <- data.frame(Col1 = LETTERS[1:5], Col2 = 1:5, stringsAsFactors = FALSE)
is.list(df)
#[1] TRUE

Therefore, you can use lapply on a data.frame to perform column-wise operations:

lapply(df,paste0, collapse = "")
#$Col1
#[1] "ABCDE"
#$Col2
#[1] "12345"

You have to be careful, however, when subsetting a data.frame, as you may not get a list depending on the method you use.

df["Col2"]
#  Col2
#1    1
#2    2
#3    3
#4    4
#5    5

is.list(df["Col2"])
#[1] TRUE

df[,"Col2"]
#[1] 1 2 3 4 5

is.list(df[,"Col2"])
#[1] FALSE

is.list(df[["Col2"]])
#[1] FALSE

is.list(df$Col2)
#[1] FALSE

is.list(subset(df,select = Col2))
#[1] TRUE

To my knowledge, however, subsetting whole rows always returns a list.

df[1,]
#  Col1 Col2
#1    A    1

is.list(df[1,])
#[1] TRUE

is.list(subset(df,1:5 == 1))
#[1] TRUE

We can use the dput function to view a text representation of the underlying structure of a single row:

dput(df[1,])
#structure(list(Col1 = "A", Col2 = 1L), row.names = 1L, class = "data.frame")

As we can see, even the single row is clearly a list. Therefore, we can reasonably unlist that row just as we would any list that is not also a data.frame.

unlist(df[1,], use.names = FALSE)
#[1] "A" "1"

unlist(list(Col1 = "A", Col2 = 1L), use.names = FALSE)
#[1] "A" "1"

Upvotes: 1

Related Questions