Reputation: 3
I'm new here, so I've tried to make this explanation and reprex as simple as possible following the guidelines, but apologies for any mistakes or etiquette boobys I've made, though I've tried to avoid them - I'm learning!
So in R I have two columns, ID and Area. Neither of these are unique values. Sometimes ID will match up to more than one Area, as shown by "ABC" below.
data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c("area1", "area2", "area3", "area4", "area5", "area5"))
How do I create a third column that concatenates all possible entries of Area for each ID, so that it looks like:
data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c( "area1", "area2", "area3", "area4", "area5", "area5"),
AreaComb=c("area1 & area4", "area2", "area3", "area1 & area4", "area5", "area 5"))
I would greatly appreciate any help I could get with this. I am learning R using DataCamp and have not got this far yet!
Edit: I should have said that I need the concatenation to just include all possible UNIQUE values, ie, entries under ID=="jkl" should only have "area5" in AreaComb, not "area5 & area5", etc.
Upvotes: 0
Views: 63
Reputation: 389275
Using dplyr
, we can group_by
ID
and paste
Area
together for each group.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(AreaComb = paste(unique(Area), collapse = " & "))
# ID Area AreaComb
# <fct> <fct> <chr>
#1 ABC area1 area1 & area4
#2 def area2 area2
#3 ghi area3 area3
#4 ABC area4 area1 & area4
#5 jkl area5 area5
data
df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c("area1", "area2", "area3", "area4", "area5", "area5"))
Upvotes: 1
Reputation: 51592
Make sure your strings are character (i.e. NOT factors) and use ave
, i.e.
dd <- data.frame(ID = c("ABC", "def", "ghi", "ABC", "jkl"),
Area = c("area1", "area2", "area3", "area4", "area5"),
stringsAsFactors = FALSE) #<--- Notice this argument
with(dd, ave(Area, ID, FUN = function(i)paste(i, collapse = ' & ')))
#[1] "area1 & area4" "area2" "area3" "area1 & area4" "area5"
Upvotes: 1
Reputation: 887891
We can use tidyverse
library(dplyr)
library(stringr)
df %>%
group_by(ID) %>%
mutate(AreaComb = str_c(Area, collapse = " & "))
# A tibble: 5 x 3
# Groups: ID [4]
# ID Area AreaComb
# <fct> <fct> <chr>
#1 ABC area1 area1 & area4
#2 def area2 area2
#3 ghi area3 area3
#4 ABC area4 area1 & area4
#5 jkl area5 area5
Or with data.table
library(data.table)
setDT(df)[, AreaComb := paste(Area, collapse= " & "), by = ID]
df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl"),
Area=c("area1", "area2", "area3", "area4", "area5"))
Upvotes: 1