Danielle
Danielle

Reputation: 795

Make a new dataframe of duplicate characters for each level of a subsetted factor

I have a dataframe (df) of shared interactions among species i and j(e.g. A_B) in a column called "interact". Interactions are recorded from the corresponding plot and region from where the interactions were sampled. I want to find all SHARED interactions among plots within ONE region. So for each region subset, the output should return duplicate interactions occurring among plots within the region. The data appears as follows:

df<-

region     plot    interact
 1          104      A_B  
 1          105      B_C
 1          106      A_B
 1          107      C_D
 2          108      B_C
 2          109      B_C
 2          110      E_F
 2          111      B_C
 3          112      A_B
 3          113      A_B

I want the output to be a dataframe that shows only shared interactions among plots within a region. Unique interactions for each region will be removed. So the output for the above example appears as:

output

 region    interact
  1          A_B
  2          B_C
  3          A_B

I have tried a for loop

region<-NA
shared.interact<- NA

for (i in 1:length(unique(df$region)) {
region[i] <- unique(df$region)          
shared.interact[i]<- duplicated(df$interact)
}


data.frame(region, shared.interaction)

Upvotes: 1

Views: 66

Answers (3)

DJJ
DJJ

Reputation: 2549

library(data.table)


df<-read.table(header=TRUE,text={"
region     plot    interacti
1          104      A_B  
1          105      B_C
1          106      A_B
1          107      C_D
2          108      B_C
2          109      B_C
2          110      E_F
2          111      B_C
3          112      A_B
3          113      A_B"})

dt <- data.table(dt)

sort the data by region and interacti

setkey(dt,region,interacti)

use only the required columns. search for duplicates and finally unique.

unique(dt[,.(region,interacti)][duplicated(region)&duplicated(interacti),])

#    region interacti
# 1:      1       A_B
# 2:      2       B_C
# 3:      3       A_B

Upvotes: 0

Thomas K
Thomas K

Reputation: 3311

Using dyplr you could do:

library(dplyr)

df %>% 
  group_by(region) %>% 
  count(interact) %>% 
  filter(n > 1)
#> # A tibble: 3 x 3
#> # Groups:   region [3]
#>   region interact     n
#>    <int>    <chr> <int>
#> 1      1      A_B     2
#> 2      2      B_C     3
#> 3      3      A_B     2

You group by region, count how often values occur in interact and keep those which appear more than once. You could get rid of the new column by adding %>% select(-n) at the end of the pipe.

Upvotes: 0

lmo
lmo

Reputation: 38520

Here is a base R method that uses a split-apply-combine methodology.

do.call(rbind, lapply(split(df[c("region", "interact")], df$region),
                      function(x) unique(x[duplicated(x$interact),])))
  region interact
1      1      A_B
2      2      B_C
3      3      A_B

split the subset data.frame on region, then apply a function that returns a data.frame with the unique set of observations that have a duplicate for eac of these regions. Finally, rbind these together with do.call.


In data.table this would be

library(data.table)
setDT(df)[, unique(interact[duplicated(interact)]), by=region]
   region  V1
1:      1 A_B
2:      2 B_C
3:      3 A_B

data

df <-
structure(list(region = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L), plot = 104:113, interact = structure(c(1L, 2L, 1L, 3L, 2L, 
2L, 4L, 2L, 1L, 1L), .Label = c("A_B", "B_C", "C_D", "E_F"), class = "factor")), .Names = c("region", 
"plot", "interact"), class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 1

Related Questions