firmo23
firmo23

Reputation: 8444

Find the number of times a unique value is appeared in more than one files and the number of those files

I have those 3 dataframes below:

Name<-c("jack","jack","bob","david","mary")
n1<-data.frame(Name)

Name<-c("jack","bill","dean","mary","steven")
n2<-data.frame(Name)

Name<-c("fred","alex","mary")
n3<-data.frame(Name)

I would like to create a new dataframe with 3 columns.All unique names present across all 3 source files in Column 1,the number of source files in which it's located, in Column 2, and the total number of instances of that name across all files, in Column 3.

The result should be like

Name Number_of_files Number_of_instances
1   jack               2                   3
2    bob               1                   1
3  david               1                   1
4   mary               3                   3
5   bill               1                   1
6   dean               1                   1
7 steven               1                   1
8   fred               1                   1
9   alex               1                   1

Is there an automated way to achieve all these at once?

Upvotes: 2

Views: 46

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389155

This is conceptually similar answer as @tmfmnk but a base R version

#Get names of all the objects n1, n2, n3, n4 . etc
name_df <- ls(pattern = "n\\d+")

#Combine them in one dataframe
all_df <- do.call(rbind, Map(cbind, mget(name_df), id = name_df))

#get aggregated values
aggregate(id~Name, all_df, function(x) c(length(unique(x)), length(x)))

#    Name id.1 id.2
#1    bob    1    1
#2  david    1    1
#3   jack    2    3
#4   mary    3    3
#5   bill    1    1
#6   dean    1    1
#7 steven    1    1
#8   alex    1    1
#9   fred    1    1

You can rename the columns if needed.


And for completeness data.table version

library(data.table)

dt < - rbindlist(mget(name_df), idcol = "ID")
dt[,  list(Number_of_files = uniqueN(ID), Number_of_instances = .N), by = .(Name)]

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40141

One dplyr possibility could be:

bind_rows(n1, n2, n3, .id = "ID") %>%
 group_by(Name) %>%
 summarise(Number_of_files = n_distinct(ID),
           Number_of_instances = n())

  Name   Number_of_files Number_of_instances
  <chr>            <int>               <int>
1 alex                 1                   1
2 bill                 1                   1
3 bob                  1                   1
4 david                1                   1
5 dean                 1                   1
6 fred                 1                   1
7 jack                 2                   3
8 mary                 3                   3
9 steven               1                   1

Upvotes: 4

Related Questions