Reputation: 2157
I have the following dataframe in R
my_df_test <- data.frame(V1 = c(1,2,1), V2 = c("A","B","A"), V3 = c("S1", "S1", "S2"), V4 = c("x","x","x"), V5 = c("y","y","y"), V6 = c("A", "B", "C"), V7 = c("D","E","F"))
my_df_test
V1 V2 V3 V4 V5 V6 V7
1 1 A S1 x y A D
2 2 B S1 x y B E
3 1 A S2 x y C F
Now I want to check if the combination of values in V1 and V2, occurs multiple times in the df. In my example my_df lines 1 and 3 have the same values '1 A' and '1 A'. If this happens, I want the following output:
> my_df_test
V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2
1 1 A S1, S2 x y A C D F
2 2 B S1 x y B NA E NA
So basically two things have changed:
The rest of the columns and values should stay the same.
I have code that works when there is only 1 'V6' like column. However when there are multiple this doesn't work anymore.
my_df_test %>%
group_by(V1, V2) %>%
mutate(new = paste0("V6_", V3), V3 = toString(V3)) %>%
spread(new, V6)
So my question is how my code should be adapted to mutate and spread multiple columns?
Upvotes: 2
Views: 290
Reputation: 28705
Another option would be to unite
the V3 columns after using pivot_wider. I converted all the factor columns to character first because unite doesn't work well with factors.
my_df_test %>%
mutate_if(is.factor, as.character) %>%
pivot_wider(names_from = 'V3', values_from = c(V3, V6:V7)) %>%
unite(V3, starts_with('V3'), sep = ', ', na.rm = TRUE)
# # A tibble: 2 x 9
# V1 V2 V4 V5 V3 V6_S1 V6_S2 V7_S1 V7_S2
# <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 A x y S1, S2 A C D F
# 2 2 B x y S1 B NA E NA
Upvotes: 1
Reputation: 11908
Rather than spread()
, you can use the new pivot_wider()
that was added in the recent tidyr 1.0.0 release. It has a values_from
argument that allows you to specify multiple columns at once:
library(dplyr)
library(tidyr)
my_df_test %>%
group_by(V1, V2) %>%
mutate(new = V3, V3 = toString(V3)) %>%
pivot_wider(
names_from = new,
values_from = c(V6, V7)
)
#> # A tibble: 2 x 9
#> # Groups: V1, V2 [4]
#> V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2
#> <dbl> <fct> <chr> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 1 A S1, S2 x y A C D F
#> 2 2 B S1 x y B <NA> E <NA>
Created on 2019-09-18 by the reprex package (v0.3.0)
Upvotes: 4
Reputation: 389235
gather
the columns together, change the key
value and then spread
library(dplyr)
library(tidyr)
my_df_test %>%
gather(key, value, V6, V7) %>%
mutate(key = paste(key, V3, sep = "_")) %>%
group_by(V1, V2) %>%
mutate(V3 = toString(unique(V3))) %>%
spread(key, value)
# V1 V2 V3 V4 V5 V6_S1 V6_S2 V7_S1 V7_S2
# <dbl> <fct> <chr> <fct> <fct> <chr> <chr> <chr> <chr>
#1 1 A S1, S2 x y A C D F
#2 2 B S1 x y B NA E NA
Upvotes: 1