Reputation: 361
Im learning the basics of R and im going through an example where the user loads a .csv file containing the weights of mice fed a Normal Control or High Fat diet.
He proceeds to make two vectors (is this true? once extracted and unlisted?)
Im confused as to what purpose the unlist function serves here. Iv seen the unlist function used before graphing as well and am confused as to what difference it makes?
Upvotes: 0
Views: 622
Reputation: 4534
The purpose of unlist
is to to flatten a list of vectors into a single vector. This is from R for Data Science. It certainly is worth of reading.
See further explanations in the comments below.
library(tidyverse)
head(data)
#> Diet Bodyweight
#> 1 chow 21.51
#> 2 chow 28.14
#> 3 chow 24.04
#> 4 chow 23.45
#> 5 chow 23.68
#> 6 chow 19.79
# without unlist you get a data.frame
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% class()
#> [1] "data.frame"
# by unlisting you get a named vector with the names taken from the selected data
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% unlist()
#> Bodyweight1 Bodyweight2 Bodyweight3 Bodyweight4 Bodyweight5 Bodyweight6
#> 21.51 28.14 24.04 23.45 23.68 19.79
#> Bodyweight7 Bodyweight8 Bodyweight9 Bodyweight10 Bodyweight11 Bodyweight12
#> 28.40 20.98 22.51 20.10 26.91 26.25
# If you set use.names=F you get a vector with the data you selected
dplyr::filter(data, Diet == 'chow') %>% select(Bodyweight) %>% unlist(use.names = F)
#> [1] 21.51 28.14 24.04 23.45 23.68 19.79 28.40 20.98 22.51 20.10 26.91 26.25
Upvotes: 1
Reputation: 1438
dplyr functions, such as filter()
and select()
, return tibbles (a variant on data.frames). Data frames and tibbles are a special type of list, where each element is a vector of the same length, but not necessarily the same type.
In the example given, each statement is selecting a single column, returned as a 1-column tibble. A 1-column tibble is a list with one element, in this case the vector of Bodyweights. However, many functions do not expect a 1-column tibble (or data.frame), but want a vector. By using unlist()
, we are squashing the structure down to a single vector. This would be true whether you selected a single column or multiple columns.
The idiomatic way in dplyr would be to pipe pull(Bodyweight)
, as opposed to using unlist()
.
Consider this simple example for the difference
tib <- tibble(a = 1:5, b = letters[1:5])
select(tib, a)
class(select(tib, a))
# Notice the different printing and class when we unlist
unlist(select(tib, a))
class(unlist(select(tib, a))
Upvotes: 3
Reputation: 606
Well that just depends on what you want to achieve. Before the unlist() you'll end up with data.frame (or more specific a tibble in this example because of the dplyr functionality applied to the data). When unlisting the single column tibble you'll end up with an atomic numeric (named) vector, which behaves totally different in some situations (the final rbind below is an example).
library(tidyverse)
mice <- structure(list(Diet=c("chow","chow","chow","chow","chow",
"chow","chow","chow","chow","chow","chow","chow","hf",
"hf","hf","hf","hf","hf","hf","hf","hf","hf","hf","hf"
),Bodyweight=c(21.51,28.14,24.04,23.45,23.68,19.79,28.4,
20.98,22.51,20.1,26.91,26.25,25.71,26.37,22.8,25.34,
24.97,28.14,29.58,30.92,34.02,21.9,31.53,20.73)),class=c("spec_tbl_df",
"tbl_df","tbl","data.frame"),row.names=c(NA,-24L),spec=structure(list(
cols=list(Diet=structure(list(),class=c("collector_character",
"collector")),Bodyweight=structure(list(),class=c("collector_double",
"collector"))),default=structure(list(),class=c("collector_guess",
"collector")),skip=1),class="col_spec"))
bodyweight <- mice %>% filter(Diet == "chow") %>% select(Bodyweight)
class(bodyweight)
#> [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
bodyweight
#> # A tibble: 12 x 1
#> Bodyweight
#> <dbl>
#> 1 21.5
#> 2 28.1
#> 3 24.0
#> 4 23.4
#> 5 23.7
#> 6 19.8
#> 7 28.4
#> 8 21.0
#> 9 22.5
#> 10 20.1
#> 11 26.9
#> 12 26.2
bodyweight_unl <- mice %>% filter(Diet == "chow") %>% select(Bodyweight) %>% unlist
class(bodyweight_unl)
#> [1] "numeric"
bodyweight_unl
#> Bodyweight1 Bodyweight2 Bodyweight3 Bodyweight4 Bodyweight5 Bodyweight6
#> 21.51 28.14 24.04 23.45 23.68 19.79
#> Bodyweight7 Bodyweight8 Bodyweight9 Bodyweight10 Bodyweight11 Bodyweight12
#> 28.40 20.98 22.51 20.10 26.91 26.25
rbind(bodyweight, 1:12)
rbind(bodyweight_unl, 1:12)
Created on 2020-07-12 by the reprex package (v0.3.0)
Upvotes: 1