jsingh
jsingh

Reputation: 209

Convert column to comma separated in R

I have two columns A and B in excel with large data.we have to consider both columns A and B, I am trying to achieve column C as output. Right now I am doing everything in excel. So I think there may a way to this in R but really don't know how to do it.Any help is appreciated..Thanks I have

 Column A   ColumnB    Column C(output column)
    A1         10           A2
    A2         10           A1
    B1         3         B2,B3,B4
    B2         3         B1,B3,B4
    B3         3         B1,B2,B4
    B4         3         B1,B2,B3
    C1         6          C2,C3
    C2         6          C1,C3
    C3         6          C1,C2

Upvotes: 0

Views: 3531

Answers (5)

AntoniosK
AntoniosK

Reputation: 16121

df = read.table(text = "
ColumnA   ColumnB   
A1         10          
A2         10          
B1         3        
B2         3        
B3         3        
B4         3        
C1         6        
C2         6        
C3         6        
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  group_by(ColumnB) %>%                                         # for each ColumnB value
  mutate(vals = list(ColumnA),                                  # create a list of all Column A values for each row
         vals = map2(vals, ColumnA, ~.x[.x != .y]),             # exclude the value in Column A from that list
         vals = map_chr(vals, ~paste0(.x, collapse = ","))) %>% # combine remaining values in the list                                        
  ungroup()                                                     # forget the grouping

# # A tibble: 9 x 3
#   ColumnA ColumnB vals    
#   <chr>     <int> <chr>   
# 1 A1           10 A2      
# 2 A2           10 A1      
# 3 B1            3 B2,B3,B4
# 4 B2            3 B1,B3,B4
# 5 B3            3 B1,B2,B4
# 6 B4            3 B1,B2,B3
# 7 C1            6 C2,C3   
# 8 C2            6 C1,C3   
# 9 C3            6 C1,C2

Upvotes: 0

www
www

Reputation: 39154

Another version of tidyverse solution. The separate function is handy to separate an existing column to new columns. By doing this, we can create the Group column to make sure all the operation would be within each group. map2 and map function are ideal to do vectorized operation. dat2 is the final output.

library(tidyverse)

dat2 <- dat %>%
  separate(ColumnA, into = c("Group", "Number"), remove = FALSE, convert = TRUE, sep = 1) %>%
  group_by(Group) %>%
  mutate(List = list(ColumnA)) %>%
  mutate(List = map2(List, ColumnA, ~.x[!(.x %in% .y)])) %>%
  mutate(ColumnC = map_chr(List, ~str_c(.x, collapse = ","))) %>%
  ungroup() %>%
  select(starts_with("Column"))
dat2
# # A tibble: 9 x 3
#   ColumnA ColumnB ColumnC 
#   <chr>     <int> <chr>   
# 1 A1           10 A2      
# 2 A2           10 A1      
# 3 B1            3 B2,B3,B4
# 4 B2            3 B1,B3,B4
# 5 B3            3 B1,B2,B4
# 6 B4            3 B1,B2,B3
# 7 C1            6 C2,C3   
# 8 C2            6 C1,C3   
# 9 C3            6 C1,C2 

DATA

dat <- read.table(text = "ColumnA   ColumnB
    A1         10 
                  A2         10 
                  B1         3
                  B2         3
                  B3         3
                  B4         3
                  C1         6
                  C2         6
                  C3         6",
                  stringsAsFactors = FALSE, header = TRUE)

Upvotes: 0

Onyambu
Onyambu

Reputation: 79198

We can group by column B then find a set difference between the current column A character and the whole characters in the group:

library(tidyverse)
df %>%
  group_by(ColumnB) %>%
  mutate(ColumnC=map_chr(ColumnA, ~toString(setdiff(ColumnA, .x))))

# A tibble: 9 x 3
# Groups:   ColumnB [3]
  ColumnA ColumnB ColumnC   
  <fct>     <int> <chr>     
1 A1           10 A2        
2 A2           10 A1        
3 B1            3 B2, B3, B4
4 B2            3 B1, B3, B4
5 B3            3 B1, B2, B4
6 B4            3 B1, B2, B3
7 C1            6 C2, C3    
8 C2            6 C1, C3    
9 C3            6 C1, C2    

Upvotes: 3

Josh Kay
Josh Kay

Reputation: 66

My understanding is to find all OTHER entries of column A that share the current value of column B

Grouping by B, and finding all A's associated with the value should do the trick (some clean-up afterward removes the current entry of A from the resulting column C)

a <- c("a1", "a2","b1", "b2","b3", "b4","c1","c2","c3","d1")

b <- c(10,10,3,3,3,3,6,6,6,5)

dta <- data.frame(a,b, stringsAsFactors = F)

dta<-dta %>% 
group_by(b) %>% 
mutate(c = paste0(a,collapse = ",")) %>% 
ungroup() %>% 
mutate(c = str_replace(c,pattern = paste0(",",a),replacement = "")) %>% 
mutate(c = str_replace(c,pattern = paste0(a,","),replacement = "")) %>% 
mutate(c = ifelse(c==a,NA,c))

Upvotes: 0

Calum You
Calum You

Reputation: 15062

I don't think the question is phrased very clearly but I am interpreting the desired results to be that you want Column C to have all the values from each group of Column B, leaving out the value of Column A. You can do this as follows:

  1. nest Column A and join it back onto the original data frame
  2. flatten it so you now have a vector of the Column A values
  3. use setdiff to get the values that are not Column A
  4. collapse into comma separated string with str_c

You can see that your desired Column C is reproduced.

library(tidyverse)
tbl <- structure(list(ColumnA = c("A1", "A2", "B1", "B2", "B3", "B4", "C1", "C2", "C3"), ColumnB = c(10L, 10L, 3L, 3L, 3L, 3L, 6L, 6L, 6L), ColumnC = c("A2", "A1", "B2,B3,B4", "B1,B3,B4", "B1,B2,B4", "B1,B2,B3", "C2,C3", "C1,C3", "C1,C2")), problems = structure(list(row = 9L, col = "ColumnC", expected = "", actual = "embedded null", file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ColumnA = structure(list(), class = c("collector_character", "collector")), ColumnB = structure(list(), class = c("collector_integer", "collector")), ColumnC = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

tbl %>%
  left_join(
    tbl %>% select(-ColumnC) %>% nest(ColumnA)
  ) %>%
  mutate(
    data = flatten(data),
    output = map2(data, ColumnA, ~ setdiff(.x, .y)),
    output = map_chr(output, ~ str_c(., collapse = ","))
    )
#> Joining, by = "ColumnB"
#> # A tibble: 9 x 5
#>   ColumnA ColumnB ColumnC  data      output  
#>   <chr>     <int> <chr>    <list>    <chr>   
#> 1 A1           10 A2       <chr [2]> A2      
#> 2 A2           10 A1       <chr [2]> A1      
#> 3 B1            3 B2,B3,B4 <chr [4]> B2,B3,B4
#> 4 B2            3 B1,B3,B4 <chr [4]> B1,B3,B4
#> 5 B3            3 B1,B2,B4 <chr [4]> B1,B2,B4
#> 6 B4            3 B1,B2,B3 <chr [4]> B1,B2,B3
#> 7 C1            6 C2,C3    <chr [3]> C2,C3   
#> 8 C2            6 C1,C3    <chr [3]> C1,C3   
#> 9 C3            6 C1,C2    <chr [3]> C1,C2

Created on 2018-08-21 by the reprex package (v0.2.0).

Upvotes: 2

Related Questions