Rstudent
Rstudent

Reputation: 885

How to identify similar rows in a data frame?

I am trying to find out how to compare elements of my df1 with df2 and count their frequency. My df1 and df2 are like this:

var1 = c(1, 2, 3, 4, 5, 6, 7) 
var2 = c(1, 1, 2, 3, 4, 5, 6) 
value = c(0, 0.75, 0.51, 0.42, 0.31, 0.22, 0.11)
freq = c(1,1,1,1,1,1,1) 
df1 = data.frame(var1, var2, value, freq)

var1 = c(1, 2, 3, 4, 5, 6, 7) 
var2 = c(1, 2, 3, 5, 4, 6, 8) 
value = c(0, 0.75, 0.42, 0.41, 0.31, 0, 0)
freq = c(1,1,1,1,1,1,1) 
df2 = data.frame(var1, var2, value, freq)

so I would like a df3 with rows that are similar in df1 and df2

From the above example df3 would be:

var1=c(1,5)
var2=c(1,4)
value=c(0,0.31) 
freq=c(1,1)
df3=data.frame(var1, var2, value, freq)

Upvotes: 0

Views: 163

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

Without the frequency part this is just a merge with default settings (i.e. inner join on all variables). To get the frequency part you can use count after grouping by all variables, then inner_join (dplyr merge equivalent) and add the individual frequencies.

I modified df1 just to check that the count part works as intended.

merge(df1, df2)
#    var1 var2 value
# 1:    1    1  0.00
# 2:    5    4  0.31

library(dplyr)

df1 <- df1[c(1, 1, seq(nrow(df1))),]

df1 %>% 
  group_by_all %>% 
  count(name = 'n1') %>% 
  inner_join(
    df2 %>% 
      group_by_all %>% 
      count(name = 'n2')
  ) %>% 
  mutate(n = n1 + n2) %>% 
  select(-n1, -n2)

# # A tibble: 2 x 4
# # Groups:   var1, var2, value [2]
#    var1  var2 value     n
#   <dbl> <dbl> <dbl> <int>
# 1     1     1  0        4
# 2     5     4  0.31     2

Upvotes: 1

tickly potato
tickly potato

Reputation: 122

like this?

library(dplyr)

df3 = df1[apply(df1 == df2, 1, all), ]
df3 %>% group_by_all() %>% summarise(freq= n())

Upvotes: 0

Related Questions