Reputation:
I am trying to find all matching values in a specific column, in a list of data.frames. However, I keep getting a returned value of character(0)
.
I have tried the following: Simple subset (very time consuming) -> e.g. dat[[i]][[i]] lapply w/ Reduce and intersect (as seen here
LocA<-data.frame(obs.date=c("2018-01-10","2018-01-14","2018-01-20),
obs.count=c(2,0,1))
LocB<-data.frame(obs.date=c("2018-01-09","2018-01-14","2018-01-20),
obs.count=c(0,3,5))
LocC<-data.frame(obs.date=c("2018-01-12","2018-01-14","2018-01-19"),
obs.count=c(2,0,1))
LocD<-data.frame(obs.date=c("2018-01-11","2018-01-16","2018-01-21"),
obs.count=c(2,0,1))
dfList<-list(LocA,LocB,LocC,LocD)
##List of all dates
lapply(dfList,'[[',1)
[1]"2018-01-10" "2018-01-14" "2018-01-20" "2018-01-09"...
Attempts (failure)
>Reduce(intersect,lapply(dfList,'[[',1))
character (0)
I expect the output of this function to be an output identifying the data.frames that share a common date.
*Extra smiles if someone know how to identify shared dates and mutate in to a single data frame where..Col1 = dataframe name, Col2=obs.date,Col3 = obs.count
Upvotes: 3
Views: 2234
Reputation: 736
You can first merge all the data frames so you only have one:
a <- Reduce(function(x, y) merge(x, y, all = TRUE), dfList)
Or you can merge them like this:
a <-rbind(LocA,LocB,LocC,LocD)
Afterwards, you can extract all the duplicates:
b <- a[duplicated(a$obs.date), ]
Or if you want to keep all the unique ones and keep the duplicates:
c <- a[!duplicated(a$obs.date), ]
Upvotes: 1
Reputation: 5766
If by "intersect" you mean doing an "inner join" or "merging" with a specific column as key, then -- you want to use dplyr::inner_join
or merge
.
First, between two data.frames:
library(dplyr)
inner_join(LocA, LocB, by='obs.date')
# 2 rows
inner_join(LocC, LocD, by='obs.date')
# zero rows
So, not infinite merging.
We'll combine the data first, then count the occurences. Notice the use of the .id
-argument to track where the row originated.
library(dplyr)
bind_rows(dfList, .id = 'id') %>%
add_count(obs.date) %>%
filter(n > 1)
# A tibble: 5 x 4
id obs.date obs.count n
<chr> <chr> <dbl> <int>
1 1 2018-01-14 0 3
2 1 2018-01-20 1 2
3 2 2018-01-14 3 3
4 2 2018-01-20 5 2
5 3 2018-01-14 0 3
Upvotes: 0