Zanskobia
Zanskobia

Reputation: 3

Create a column with concatenated possible values of one column based on an ID column

I'm new here, so I've tried to make this explanation and reprex as simple as possible following the guidelines, but apologies for any mistakes or etiquette boobys I've made, though I've tried to avoid them - I'm learning!

So in R I have two columns, ID and Area. Neither of these are unique values. Sometimes ID will match up to more than one Area, as shown by "ABC" below.

data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c("area1", "area2", "area3", "area4", "area5", "area5"))

How do I create a third column that concatenates all possible entries of Area for each ID, so that it looks like:

data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c( "area1", "area2", "area3", "area4", "area5", "area5"),
AreaComb=c("area1 & area4", "area2", "area3", "area1 & area4", "area5", "area 5"))

I would greatly appreciate any help I could get with this. I am learning R using DataCamp and have not got this far yet!

Edit: I should have said that I need the concatenation to just include all possible UNIQUE values, ie, entries under ID=="jkl" should only have "area5" in AreaComb, not "area5 & area5", etc.

Upvotes: 0

Views: 63

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389275

Using dplyr, we can group_by ID and paste Area together for each group.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(AreaComb = paste(unique(Area), collapse = " & "))

#   ID    Area  AreaComb     
#  <fct> <fct> <chr>        
#1 ABC   area1 area1 & area4
#2 def   area2 area2        
#3 ghi   area3 area3        
#4 ABC   area4 area1 & area4
#5 jkl   area5 area5       

data

df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
       Area=c("area1", "area2", "area3", "area4", "area5", "area5"))

Upvotes: 1

Sotos
Sotos

Reputation: 51592

Make sure your strings are character (i.e. NOT factors) and use ave, i.e.

dd <- data.frame(ID = c("ABC", "def", "ghi", "ABC", "jkl"),
                Area = c("area1", "area2", "area3", "area4", "area5"), 
                stringsAsFactors = FALSE) #<--- Notice this argument

with(dd, ave(Area, ID, FUN = function(i)paste(i, collapse = ' & ')))
#[1] "area1 & area4" "area2"         "area3"         "area1 & area4" "area5"    

Upvotes: 1

akrun
akrun

Reputation: 887891

We can use tidyverse

library(dplyr)
library(stringr)
df %>%
   group_by(ID) %>%
   mutate(AreaComb = str_c(Area, collapse = " & "))
# A tibble: 5 x 3
# Groups:   ID [4]
#  ID    Area  AreaComb     
#  <fct> <fct> <chr>        
#1 ABC   area1 area1 & area4
#2 def   area2 area2        
#3 ghi   area3 area3        
#4 ABC   area4 area1 & area4
#5 jkl   area5 area5        

Or with data.table

library(data.table)
setDT(df)[, AreaComb := paste(Area, collapse= " & "), by = ID]

data

df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl"),
             Area=c("area1", "area2", "area3", "area4", "area5")) 

Upvotes: 1

Related Questions