Reputation: 795
I have a dataframe (df) of shared interactions among species i and j(e.g. A_B) in a column called "interact". Interactions are recorded from the corresponding plot and region from where the interactions were sampled. I want to find all SHARED interactions among plots within ONE region. So for each region subset, the output should return duplicate interactions occurring among plots within the region. The data appears as follows:
df<-
region plot interact
1 104 A_B
1 105 B_C
1 106 A_B
1 107 C_D
2 108 B_C
2 109 B_C
2 110 E_F
2 111 B_C
3 112 A_B
3 113 A_B
I want the output to be a dataframe that shows only shared interactions among plots within a region. Unique interactions for each region will be removed. So the output for the above example appears as:
output
region interact
1 A_B
2 B_C
3 A_B
I have tried a for loop
region<-NA
shared.interact<- NA
for (i in 1:length(unique(df$region)) {
region[i] <- unique(df$region)
shared.interact[i]<- duplicated(df$interact)
}
data.frame(region, shared.interaction)
Upvotes: 1
Views: 66
Reputation: 2549
library(data.table)
df<-read.table(header=TRUE,text={"
region plot interacti
1 104 A_B
1 105 B_C
1 106 A_B
1 107 C_D
2 108 B_C
2 109 B_C
2 110 E_F
2 111 B_C
3 112 A_B
3 113 A_B"})
dt <- data.table(dt)
sort the data by region
and interacti
setkey(dt,region,interacti)
use only the required columns. search for duplicates and finally unique
.
unique(dt[,.(region,interacti)][duplicated(region)&duplicated(interacti),])
# region interacti
# 1: 1 A_B
# 2: 2 B_C
# 3: 3 A_B
Upvotes: 0
Reputation: 3311
Using dyplr
you could do:
library(dplyr)
df %>%
group_by(region) %>%
count(interact) %>%
filter(n > 1)
#> # A tibble: 3 x 3
#> # Groups: region [3]
#> region interact n
#> <int> <chr> <int>
#> 1 1 A_B 2
#> 2 2 B_C 3
#> 3 3 A_B 2
You group by region
, count how often values occur in interact
and keep those which appear more than once. You could get rid of the new column by adding %>% select(-n)
at the end of the pipe.
Upvotes: 0
Reputation: 38520
Here is a base R method that uses a split-apply-combine methodology.
do.call(rbind, lapply(split(df[c("region", "interact")], df$region),
function(x) unique(x[duplicated(x$interact),])))
region interact
1 1 A_B
2 2 B_C
3 3 A_B
split the subset data.frame on region, then apply a function that returns a data.frame with the unique set of observations that have a duplicate for eac of these regions. Finally, rbind
these together with do.call
.
In data.table
this would be
library(data.table)
setDT(df)[, unique(interact[duplicated(interact)]), by=region]
region V1
1: 1 A_B
2: 2 B_C
3: 3 A_B
data
df <-
structure(list(region = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L), plot = 104:113, interact = structure(c(1L, 2L, 1L, 3L, 2L,
2L, 4L, 2L, 1L, 1L), .Label = c("A_B", "B_C", "C_D", "E_F"), class = "factor")), .Names = c("region",
"plot", "interact"), class = "data.frame", row.names = c(NA,
-10L))
Upvotes: 1