Kevin Lee
Kevin Lee

Reputation: 361

Why might one use the unlist() function? (example inside)

Im learning the basics of R and im going through an example where the user loads a .csv file containing the weights of mice fed a Normal Control or High Fat diet.

He proceeds to make two vectors (is this true? once extracted and unlisted?)

Im confused as to what purpose the unlist function serves here. Iv seen the unlist function used before graphing as well and am confused as to what difference it makes? enter image description here

Upvotes: 0

Views: 622

Answers (3)

MarBlo
MarBlo

Reputation: 4534

The purpose of unlist is to to flatten a list of vectors into a single vector. This is from R for Data Science. It certainly is worth of reading.

See further explanations in the comments below.

library(tidyverse)


head(data)
#>   Diet Bodyweight
#> 1 chow      21.51
#> 2 chow      28.14
#> 3 chow      24.04
#> 4 chow      23.45
#> 5 chow      23.68
#> 6 chow      19.79
# without unlist you get a data.frame
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% class()
#> [1] "data.frame"

# by unlisting you get a named vector with the names taken from the selected data
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% unlist()
#>  Bodyweight1  Bodyweight2  Bodyweight3  Bodyweight4  Bodyweight5  Bodyweight6 
#>        21.51        28.14        24.04        23.45        23.68        19.79 
#>  Bodyweight7  Bodyweight8  Bodyweight9 Bodyweight10 Bodyweight11 Bodyweight12 
#>        28.40        20.98        22.51        20.10        26.91        26.25

# If you set use.names=F you get a vector with the data you selected
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% unlist(use.names = F)
#>  [1] 21.51 28.14 24.04 23.45 23.68 19.79 28.40 20.98 22.51 20.10 26.91 26.25

Upvotes: 1

smingerson
smingerson

Reputation: 1438

dplyr functions, such as filter() and select(), return tibbles (a variant on data.frames). Data frames and tibbles are a special type of list, where each element is a vector of the same length, but not necessarily the same type.

In the example given, each statement is selecting a single column, returned as a 1-column tibble. A 1-column tibble is a list with one element, in this case the vector of Bodyweights. However, many functions do not expect a 1-column tibble (or data.frame), but want a vector. By using unlist(), we are squashing the structure down to a single vector. This would be true whether you selected a single column or multiple columns.

The idiomatic way in dplyr would be to pipe pull(Bodyweight), as opposed to using unlist().

Consider this simple example for the difference

tib <- tibble(a = 1:5, b = letters[1:5])
select(tib, a)
class(select(tib, a))
# Notice the different printing and class when we unlist
unlist(select(tib, a))
class(unlist(select(tib, a))

Upvotes: 3

mabreitling
mabreitling

Reputation: 606

Well that just depends on what you want to achieve. Before the unlist() you'll end up with data.frame (or more specific a tibble in this example because of the dplyr functionality applied to the data). When unlisting the single column tibble you'll end up with an atomic numeric (named) vector, which behaves totally different in some situations (the final rbind below is an example).

library(tidyverse)
mice <- structure(list(Diet=c("chow","chow","chow","chow","chow",
"chow","chow","chow","chow","chow","chow","chow","hf",
"hf","hf","hf","hf","hf","hf","hf","hf","hf","hf","hf"
),Bodyweight=c(21.51,28.14,24.04,23.45,23.68,19.79,28.4,
20.98,22.51,20.1,26.91,26.25,25.71,26.37,22.8,25.34,
24.97,28.14,29.58,30.92,34.02,21.9,31.53,20.73)),class=c("spec_tbl_df",
"tbl_df","tbl","data.frame"),row.names=c(NA,-24L),spec=structure(list(
cols=list(Diet=structure(list(),class=c("collector_character",
"collector")),Bodyweight=structure(list(),class=c("collector_double",
"collector"))),default=structure(list(),class=c("collector_guess",
"collector")),skip=1),class="col_spec"))

bodyweight <- mice %>% filter(Diet == "chow") %>% select(Bodyweight)
class(bodyweight)
#> [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
bodyweight
#> # A tibble: 12 x 1
#>    Bodyweight
#>         <dbl>
#>  1       21.5
#>  2       28.1
#>  3       24.0
#>  4       23.4
#>  5       23.7
#>  6       19.8
#>  7       28.4
#>  8       21.0
#>  9       22.5
#> 10       20.1
#> 11       26.9
#> 12       26.2

bodyweight_unl <- mice %>% filter(Diet == "chow") %>% select(Bodyweight) %>% unlist
class(bodyweight_unl)
#> [1] "numeric"
bodyweight_unl
#>  Bodyweight1  Bodyweight2  Bodyweight3  Bodyweight4  Bodyweight5  Bodyweight6 
#>        21.51        28.14        24.04        23.45        23.68        19.79 
#>  Bodyweight7  Bodyweight8  Bodyweight9 Bodyweight10 Bodyweight11 Bodyweight12 
#>        28.40        20.98        22.51        20.10        26.91        26.25

rbind(bodyweight, 1:12)
rbind(bodyweight_unl, 1:12)

Created on 2020-07-12 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions