Cristhian
Cristhian

Reputation: 371

Using PCA in tidyverse framework with grouped data

# Libraries
library(tidyverse)
library(broom)

I know that tidyverse can be used to performance PCA, here two examples:

Example 1:

iris_pca_v1 <- iris %>% 
  nest() %>% 
  mutate(
    pca = map(data, ~prcomp(.x %>% select(-Species), center = T, scale = T)),
    pca_aug = map2(pca, data, ~augment(.x, data = .y))
  )

Example 2:

iris_pca_v2 <- iris %>% 
  select(-Species) %>%
  prcomp(center = T, scale = T)

But I'd like to know if there is a way to use the same tidyverse framework to apply PCA in grouped data. Suppose that I need to have different PCs for each Specie.

Note: In my real case I'm working with 20 variables for 50 states over 10 years, I'd like to apply PCA to make an index for each state, compressing the 20 variables into for the 10 years.

Upvotes: 1

Views: 550

Answers (1)

stefan
stefan

Reputation: 124393

Maybe this is what your are looking for. To achieve your desired result you could nest or group by Species.

library(tidyverse)
library(broom)

iris_pca_v1 <- iris %>% 
  nest() %>% 
  mutate(
    pca = map(data, ~prcomp(.x %>% select(-Species), center = T, scale = T)),
    pca_aug = map2(pca, data, ~augment(.x, data = .y))
  ) %>% 
  unnest(pca_aug) %>% 
  select(-data, -pca)
#> Warning: `...` must not be empty for ungrouped data frames.
#> Did you want `data = everything()`?

iris_pca_v2 <- iris %>% 
  nest(data = -Species) %>% 
  mutate(
    pca = map(data, ~ prcomp(.x, center = T, scale = T)),
    pca_aug = map2(pca, data, ~augment(.x, data = .y))
  ) %>% 
  unnest(pca_aug) %>% 
  select(-data, -pca)

ggplot() +
  geom_point(data = iris_pca_v1, aes(.fittedPC1, .fittedPC2, color = "iris_pca_v1")) +
  geom_point(data = iris_pca_v2, aes(.fittedPC1, .fittedPC2, color = "iris_pca_grouped")) +
  scale_color_manual(values = c(iris_pca_v1 = "black", iris_pca_grouped = "red" )) +
  facet_wrap(~Species)

Upvotes: 3

Related Questions