R Ban
R Ban

Reputation: 97

Get the unique elements from the data frame by comparing two column values

I have extracted some elements from regrex and combined them. Now in the final df , I have got two columns in a data frame. I have to get unique elements from f1 column based on f2 column.


df <- as.data.frame(rbind(c('11061002','11862192','11083069'),
                          c(" ",'1234567','452589')))

df$f1 <-paste0(df$V1,
               ',',
               df$V2,
               ',',
               df$V3)            


df_1 <- as.data.frame(rbind(c('11862192'),
                            c('145')))



names(df_1)[1] <-'f2'


df <- as.data.frame(cbind(df,df_1))

df <-df[,c(4,5)]

The expected output is the third column with values : 11061002,11083069 as 11862192 was present in both. ,1234567,452589 as there is not 145 present in second column.

Please guide.

Upvotes: 0

Views: 36

Answers (2)

akrun
akrun

Reputation: 887951

We can use tidyverse

library(dplyr)
library(tidyr)
df %>% 
  mutate(rn = row_number()) %>%
  separate_rows(f1, f2) %>% 
  group_by(rn)%>% 
  summarise(new = toString(setdiff(setdiff(f1, f2), ""))) %>%
  select(-rn) %>% 
  bind_cols(df, .)
# A tibble: 2 x 6
#  V1         V2       V3       f1                           f2       new               
#  <chr>      <chr>    <chr>    <chr>                        <chr>    <chr>             
#1 "11061002" 11862192 11083069 "11061002,11862192,11083069" 11862192 11061002, 11083069
#2 " "        1234567  452589   " ,1234567,452589"           145      1234567, 452589  

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

You can split the string on , in f1 and use setdiff to get values that are not present in f2 after removing empty values.

mapply(function(x, y) toString(setdiff(x[x!=' '], y)), 
                      strsplit(df$f1, ','), df$f2)
#[1] "11061002, 11083069" "1234567, 452589" 

If there could be multiple comma-separated values in f2, we can split f2 as well.

mapply(function(x, y) toString(setdiff(x[x!=' '], y)), 
                      strsplit(df$f1, ','), strsplit(df$f2, ','))

Upvotes: 3

Related Questions