I have the following dataframe in R my_df_test <- data.frame(V1 = c(1,2,1), V2 = c("A","B","A"), V3 = c("S1", "S1", "S2"), V4 = c("x","x","x"), V5 = c("y","y","y"), V6 = c("A", "B", "C"), V7 = c("D","E","F")) my_df_test V1 V2 V3 V4 V5 V6 V7 1 1 A S1 x y A D 2 2 B S1 x y B E 3 1 A S2 x y C F Now I want to check if the combination of values in V1 and V2, occurs multiple times in the df. In my example my_df lines 1 and 3 have the same values '1 A' and '1 A'. If this happens, I want the following output: > my_df_test V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2 1 1 A S1, S2 x y A C D F 2 2 B S1 x y B NA E NA So basically two things have changed: V3 now contains the values of all the lines in the df in which the values in V1 and V2 are the same. They are seperated by a ',' there are new V6 & V7 columns that contains the original values of these columns The rest of the columns and values should stay the same. I have code that works when there is only 1 'V6' like column. However when there are multiple this doesn't work anymore. my_df_test %>% group_by(V1, V2) %>% mutate(new = paste0("V6_", V3), V3 = toString(V3)) %>% spread(new, V6) So my question is how my code should be adapted to mutate and spread multiple columns?

user1987607

Reputation: 2157

R - tidyr - mutate and spread multiple columns

I have the following dataframe in R

my_df_test <- data.frame(V1 = c(1,2,1), V2 = c("A","B","A"), V3 = c("S1", "S1", "S2"), V4 = c("x","x","x"), V5 = c("y","y","y"), V6 = c("A", "B", "C"), V7 = c("D","E","F"))

my_df_test
  V1 V2 V3 V4 V5 V6 V7
1  1  A S1  x  y  A  D
2  2  B S1  x  y  B  E
3  1  A S2  x  y  C  F

Now I want to check if the combination of values in V1 and V2, occurs multiple times in the df. In my example my_df lines 1 and 3 have the same values '1 A' and '1 A'. If this happens, I want the following output:

> my_df_test

   V1 V2     V3 V4 V5  V6_S1   V6_S2   V7_S1   V7_S2
 1  1  A S1, S2  x  y      A       C       D       F
 2  2  B     S1  x  y      B      NA       E      NA

So basically two things have changed:

V3 now contains the values of all the lines in the df in which the values in V1 and V2 are the same. They are seperated by a ','
there are new V6 & V7 columns that contains the original values of these columns

The rest of the columns and values should stay the same.

I have code that works when there is only 1 'V6' like column. However when there are multiple this doesn't work anymore.

my_df_test %>%
    group_by(V1, V2) %>%
    mutate(new = paste0("V6_", V3), V3 = toString(V3)) %>%
    spread(new, V6)

So my question is how my code should be adapted to mutate and spread multiple columns?

Upvotes: 2

Answers (3)

IceCreamToucan

Reputation: 28705

Another option would be to unite the V3 columns after using pivot_wider. I converted all the factor columns to character first because unite doesn't work well with factors.

my_df_test %>% 
  mutate_if(is.factor, as.character) %>% 
  pivot_wider(names_from = 'V3', values_from = c(V3, V6:V7)) %>% 
  unite(V3, starts_with('V3'), sep = ', ', na.rm = TRUE)

# # A tibble: 2 x 9
#      V1 V2    V4    V5    V3     V6_S1 V6_S2 V7_S1 V7_S2
#   <dbl> <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr>
# 1     1 A     x     y     S1, S2 A     C     D     F    
# 2     2 B     x     y     S1     B     NA    E     NA

Upvotes: 1

Mikko Marttila

Reputation: 11908

Rather than spread(), you can use the new pivot_wider() that was added in the recent tidyr 1.0.0 release. It has a values_from argument that allows you to specify multiple columns at once:

library(dplyr)
library(tidyr)

my_df_test %>% 
  group_by(V1, V2) %>% 
  mutate(new = V3, V3 = toString(V3)) %>% 
  pivot_wider(
    names_from  = new,
    values_from = c(V6, V7)
  )
#> # A tibble: 2 x 9
#> # Groups:   V1, V2 [4]
#>      V1 V2    V3     V4    V5    V6_S1 V6_S2 V7_S1 V7_S2
#>   <dbl> <fct> <chr>  <fct> <fct> <fct> <fct> <fct> <fct>
#> 1     1 A     S1, S2 x     y     A     C     D     F    
#> 2     2 B     S1     x     y     B     <NA>  E     <NA>

^{Created on 2019-09-18 by the reprex package (v0.3.0)}

Upvotes: 4

Ronak Shah

Reputation: 389235

gather the columns together, change the key value and then spread

library(dplyr)
library(tidyr)

my_df_test %>%
  gather(key, value, V6, V7) %>%
  mutate(key = paste(key, V3, sep = "_")) %>%
  group_by(V1, V2) %>%
  mutate(V3 = toString(unique(V3))) %>%
  spread(key, value)

#     V1 V2    V3     V4    V5    V6_S1 V6_S2 V7_S1 V7_S2
#  <dbl> <fct> <chr>  <fct> <fct> <chr> <chr> <chr> <chr>
#1     1 A     S1, S2 x     y     A     C     D     F    
#2     2 B     S1     x     y     B     NA    E     NA

Upvotes: 1

R - tidyr - mutate and spread multiple columns

Answers (3)

Related Questions