Reputation: 55
I am trying to compare a single column from multiple dataframes of varying length and order of non numeric elements in R. My data consists of unique non numeric elements from multiple samples which have been saved as its own dataframe, no doubles per dataframe, and I would like to compare them to several other dataframesto see what elements come up in multiple dataframes and in how many dataframes the same element came up.
df1 <- data.frame(names = rep(c("Nina", "Doug", "Alli", "Doug")))
df2 <- data.frame(names = rep(c("Steve", "Alli", "Nina")))
df3 <- data.frame(names = rep(c("Doug", "Steve", "Nina", "Bob")))
df1 df2 df3
[names] [names] [names]
Nina Steve Doug
Doug Alli Steve
Alli Nina Nina
Doug Bob
And now I would like to compare df1 df2 df3 with an output that tells me which names are similar across dataframes and how many times they appear.
Names Matches
Nina [3]
Doug [3]
Alli [2]
Steve[2]
Bob [1]
My real dataset has many more dataframes and names so bonus if the output could be in order from most appearance in most number of dataframes to least.
I am fairly new to R and not really sure how to even start tackling this. Currently Ive created lists of the dataframes that I would like to compare to each other but any suggestions are much appreciated. Thank you for your time!
Upvotes: 1
Views: 164
Reputation: 11514
Try
df1 <- data.frame(names = rep(c("Nina", "Doug", "Alli", "Doug")), stringsAsFactors = F)
df2 <- data.frame(names = rep(c("Steve", "Alli", "Nina")), stringsAsFactors = F)
df3 <- data.frame(names = rep(c("Doug", "Steve", "Nina", "Bob")), stringsAsFactors = F)
table(c(df1$names, df2$names, df3$names))
Alli Bob Doug Nina Steve
2 1 3 3 2
Or, to format it more:
df <- data.frame(table(c(df1$names, df2$names, df3$names)))
names(df) <- c("Names", "Matches")
Including the ordering:
df[order(df$Matches, decreasing = T),]
Names Matches
3 Doug 3
4 Nina 3
1 Alli 2
5 Steve 2
2 Bob 1
Make sure the names are not factors, or convert them to characters otherwise.
Upvotes: 1