Charyoshi
Charyoshi

Reputation: 55

Comparing elements of a column across multiple dataframes to output number of matches in R

I am trying to compare a single column from multiple dataframes of varying length and order of non numeric elements in R. My data consists of unique non numeric elements from multiple samples which have been saved as its own dataframe, no doubles per dataframe, and I would like to compare them to several other dataframesto see what elements come up in multiple dataframes and in how many dataframes the same element came up.

Sample Data

df1 <- data.frame(names = rep(c("Nina", "Doug", "Alli", "Doug")))
df2 <- data.frame(names = rep(c("Steve", "Alli", "Nina")))
df3 <- data.frame(names = rep(c("Doug", "Steve", "Nina", "Bob")))

  df1    df2    df3
[names] [names] [names]   
 Nina    Steve  Doug   
 Doug    Alli   Steve   
 Alli    Nina   Nina  
 Doug           Bob

And now I would like to compare df1 df2 df3 with an output that tells me which names are similar across dataframes and how many times they appear.

Output

Names Matches

Nina [3]
Doug [3]
Alli [2]
Steve[2]
Bob  [1]

My real dataset has many more dataframes and names so bonus if the output could be in order from most appearance in most number of dataframes to least.

I am fairly new to R and not really sure how to even start tackling this. Currently Ive created lists of the dataframes that I would like to compare to each other but any suggestions are much appreciated. Thank you for your time!

Upvotes: 1

Views: 164

Answers (1)

coffeinjunky
coffeinjunky

Reputation: 11514

Try

df1 <- data.frame(names = rep(c("Nina", "Doug", "Alli", "Doug")), stringsAsFactors = F)
df2 <- data.frame(names = rep(c("Steve", "Alli", "Nina")), stringsAsFactors = F)
df3 <- data.frame(names = rep(c("Doug", "Steve", "Nina", "Bob")), stringsAsFactors = F)

table(c(df1$names, df2$names, df3$names))

 Alli   Bob  Doug  Nina Steve 
    2     1     3     3     2 

Or, to format it more:

df <- data.frame(table(c(df1$names, df2$names, df3$names)))
names(df) <- c("Names", "Matches")

Including the ordering:

df[order(df$Matches, decreasing = T),]
  Names Matches
3  Doug       3
4  Nina       3
1  Alli       2
5 Steve       2
2   Bob       1

Make sure the names are not factors, or convert them to characters otherwise.

Upvotes: 1

Related Questions