Reputation: 209
I have two columns A and B in excel with large data.we have to consider both columns A and B, I am trying to achieve column C as output. Right now I am doing everything in excel. So I think there may a way to this in R but really don't know how to do it.Any help is appreciated..Thanks I have
Column A ColumnB Column C(output column)
A1 10 A2
A2 10 A1
B1 3 B2,B3,B4
B2 3 B1,B3,B4
B3 3 B1,B2,B4
B4 3 B1,B2,B3
C1 6 C2,C3
C2 6 C1,C3
C3 6 C1,C2
Upvotes: 0
Views: 3531
Reputation: 16121
df = read.table(text = "
ColumnA ColumnB
A1 10
A2 10
B1 3
B2 3
B3 3
B4 3
C1 6
C2 6
C3 6
", header=T, stringsAsFactors=F)
library(tidyverse)
df %>%
group_by(ColumnB) %>% # for each ColumnB value
mutate(vals = list(ColumnA), # create a list of all Column A values for each row
vals = map2(vals, ColumnA, ~.x[.x != .y]), # exclude the value in Column A from that list
vals = map_chr(vals, ~paste0(.x, collapse = ","))) %>% # combine remaining values in the list
ungroup() # forget the grouping
# # A tibble: 9 x 3
# ColumnA ColumnB vals
# <chr> <int> <chr>
# 1 A1 10 A2
# 2 A2 10 A1
# 3 B1 3 B2,B3,B4
# 4 B2 3 B1,B3,B4
# 5 B3 3 B1,B2,B4
# 6 B4 3 B1,B2,B3
# 7 C1 6 C2,C3
# 8 C2 6 C1,C3
# 9 C3 6 C1,C2
Upvotes: 0
Reputation: 39154
Another version of tidyverse
solution. The separate
function is handy to separate an existing column to new columns. By doing this, we can create the Group
column to make sure all the operation would be within each group. map2
and map
function are ideal to do vectorized operation. dat2
is the final output.
library(tidyverse)
dat2 <- dat %>%
separate(ColumnA, into = c("Group", "Number"), remove = FALSE, convert = TRUE, sep = 1) %>%
group_by(Group) %>%
mutate(List = list(ColumnA)) %>%
mutate(List = map2(List, ColumnA, ~.x[!(.x %in% .y)])) %>%
mutate(ColumnC = map_chr(List, ~str_c(.x, collapse = ","))) %>%
ungroup() %>%
select(starts_with("Column"))
dat2
# # A tibble: 9 x 3
# ColumnA ColumnB ColumnC
# <chr> <int> <chr>
# 1 A1 10 A2
# 2 A2 10 A1
# 3 B1 3 B2,B3,B4
# 4 B2 3 B1,B3,B4
# 5 B3 3 B1,B2,B4
# 6 B4 3 B1,B2,B3
# 7 C1 6 C2,C3
# 8 C2 6 C1,C3
# 9 C3 6 C1,C2
DATA
dat <- read.table(text = "ColumnA ColumnB
A1 10
A2 10
B1 3
B2 3
B3 3
B4 3
C1 6
C2 6
C3 6",
stringsAsFactors = FALSE, header = TRUE)
Upvotes: 0
Reputation: 79198
We can group by column B then find a set difference between the current column A character and the whole characters in the group:
library(tidyverse)
df %>%
group_by(ColumnB) %>%
mutate(ColumnC=map_chr(ColumnA, ~toString(setdiff(ColumnA, .x))))
# A tibble: 9 x 3
# Groups: ColumnB [3]
ColumnA ColumnB ColumnC
<fct> <int> <chr>
1 A1 10 A2
2 A2 10 A1
3 B1 3 B2, B3, B4
4 B2 3 B1, B3, B4
5 B3 3 B1, B2, B4
6 B4 3 B1, B2, B3
7 C1 6 C2, C3
8 C2 6 C1, C3
9 C3 6 C1, C2
Upvotes: 3
Reputation: 66
My understanding is to find all OTHER entries of column A that share the current value of column B
Grouping by B, and finding all A's associated with the value should do the trick (some clean-up afterward removes the current entry of A from the resulting column C)
a <- c("a1", "a2","b1", "b2","b3", "b4","c1","c2","c3","d1")
b <- c(10,10,3,3,3,3,6,6,6,5)
dta <- data.frame(a,b, stringsAsFactors = F)
dta<-dta %>%
group_by(b) %>%
mutate(c = paste0(a,collapse = ",")) %>%
ungroup() %>%
mutate(c = str_replace(c,pattern = paste0(",",a),replacement = "")) %>%
mutate(c = str_replace(c,pattern = paste0(a,","),replacement = "")) %>%
mutate(c = ifelse(c==a,NA,c))
Upvotes: 0
Reputation: 15062
I don't think the question is phrased very clearly but I am interpreting the desired results to be that you want Column C to have all the values from each group of Column B, leaving out the value of Column A. You can do this as follows:
nest
Column A and join it back onto the original data frameflatten
it so you now have a vector of the Column A valuessetdiff
to get the values that are not Column Astr_c
You can see that your desired Column C is reproduced.
library(tidyverse)
tbl <- structure(list(ColumnA = c("A1", "A2", "B1", "B2", "B3", "B4", "C1", "C2", "C3"), ColumnB = c(10L, 10L, 3L, 3L, 3L, 3L, 6L, 6L, 6L), ColumnC = c("A2", "A1", "B2,B3,B4", "B1,B3,B4", "B1,B2,B4", "B1,B2,B3", "C2,C3", "C1,C3", "C1,C2")), problems = structure(list(row = 9L, col = "ColumnC", expected = "", actual = "embedded null", file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(ColumnA = structure(list(), class = c("collector_character", "collector")), ColumnB = structure(list(), class = c("collector_integer", "collector")), ColumnC = structure(list(), class = c("collector_character", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
left_join(
tbl %>% select(-ColumnC) %>% nest(ColumnA)
) %>%
mutate(
data = flatten(data),
output = map2(data, ColumnA, ~ setdiff(.x, .y)),
output = map_chr(output, ~ str_c(., collapse = ","))
)
#> Joining, by = "ColumnB"
#> # A tibble: 9 x 5
#> ColumnA ColumnB ColumnC data output
#> <chr> <int> <chr> <list> <chr>
#> 1 A1 10 A2 <chr [2]> A2
#> 2 A2 10 A1 <chr [2]> A1
#> 3 B1 3 B2,B3,B4 <chr [4]> B2,B3,B4
#> 4 B2 3 B1,B3,B4 <chr [4]> B1,B3,B4
#> 5 B3 3 B1,B2,B4 <chr [4]> B1,B2,B4
#> 6 B4 3 B1,B2,B3 <chr [4]> B1,B2,B3
#> 7 C1 6 C2,C3 <chr [3]> C2,C3
#> 8 C2 6 C1,C3 <chr [3]> C1,C3
#> 9 C3 6 C1,C2 <chr [3]> C1,C2
Created on 2018-08-21 by the reprex package (v0.2.0).
Upvotes: 2