Dario Lacan
Dario Lacan

Reputation: 1140

What does the n_unique value mean, when skimming list variables?

I do not understand the meaning of the n_unique value in skim about list variables:

library(tidyverse)
library(skimr)

skim(starwars)

The following is part of the result, about the three list variables in the dataset:

detail of the result of skim(starwars)

Now, there are 10 different vehicles in the dataset, so it makes sense that n_unique is 11 (including the null case of Star Wars characters not using any vehicle). Characters can use from a min of zero vehicles (min_length) to a max of two different vehicles (max_length) all along the movies. There are also 16 starships, and characters can use from zero to five different starships, so all makes sense.

However, there are only seven movies. So, n_unique should be 7 and not 24. Also, it is true that a character can make an appearance in a minimum of one movie (min_length) to a max of all the seven movies (max_length).

Upvotes: 0

Views: 34

Answers (1)

bretauv
bretauv

Reputation: 8557

There are 7 values for individual films, but there are 24 unique elements if you compare the list elements between themselves.

For example if the first element is [The Phantom Menace, Revenge of the Sith] and the second element is [The Phantom Menace] then the two elements are different.

library(tidyverse)
library(skimr)

# Count unique individual films
starwars$films |> 
  unlist() |> 
  unique() |> 
  length()
#> [1] 7

# Count unique list elements
starwars$films |> 
  unique() |> 
  length()
#> [1] 24

Upvotes: 0

Related Questions