Reputation: 61
This is my problem: I have two different data frames (A and B). The column of each data frame is a geographical locality and the row data are the species in a locality. I need to intersect the list of species of the locality 1 of the data fame A with the list of species of all the localities of the data frame B. To do this I wrote a loop like this:
res<-list()
for(i in 1:length(B)) {intersect(A[1], B[i])->res[[i]]
}
Now I have to repeat the same loop for the locality 2, 3, 4, 5,6,..... of A, that is to say I have to intersect all the localities of A with all the localities of B.
Thank you.
Upvotes: 2
Views: 884
Reputation: 4133
It's difficulty to fully understand what you want to obtain as a result. But if I guessed correctly your needs the code below will do what you want. This code can be further optimized of course to improve speed, because for big datasets it may work not too fast.
res <- list()
for (i in 1:ncol(A)) {
res[[i]] <- list()
for (j in 1:ncol(B)) {
res[[i]][[j]] <- intersect(A[,i], B[,j])
}
}
To access result you can use
res[[column_index_in_A]][[column_index_in_B]]
Upvotes: 3
Reputation: 96984
Here is a similar approach to nested loops that uses lapply()
.
If you have a large dataset, using lapply()
may gain you very considerable speed improvements over using loops. Loops are slow in R, and it is recommended to use vectorized functions in the *apply
family where possible.
I'll walk through an example and you can perhaps adapt it to your dataset.
First, we make a sample 3x3 data frame called df
, with columns a
, b
and c
, and rows d
, e
and f
:
> df <- data.frame(a = sample(3), b = sample(3), c = sample(3))
> rownames(df) <- c('d','e','f')
Let's look at df
and its transpose t(df)
:
> df
a b c
d 3 1 3
e 1 3 1
f 2 2 2
> t(df)
d e f
a 3 1 2
b 1 3 2
c 3 1 2
Let's say we want to intersect
the column vectors of df
and t(df)
. We now use nested lapply()
statements to run intersect()
on column vectors from both df
and the transpose t(df)
:
> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))
The results are a list()
, showing the intersection results:
> is.list(result)
[1] TRUE
> print(result)
$a
$a$d
[1] 3 1
$a$e
[1] 3 1
$a$f
[1] 2
$b
$b$d
[1] 1 3
$b$e
[1] 1 3
$b$f
[1] 2
$c
$c$d
[1] 3 1
$c$e
[1] 3 1
$c$f
[1] 2
Let's look at df
and t(df)
again, and see how to read these results:
> df
a b c
d 3 1 3
e 1 3 1
f 2 2 2
> t(df)
d e f
a 3 1 2
b 1 3 2
c 3 1 2
Let's look at df$a
intersected with t(df)$d
, t(df)$e
and t(df)$f
:
$a
$a$d
[1] 3 1
Intersecting the vectors a
and d
: {3,1,2}^{3,1,3} = {3,1}
$a$e
[1] 3 1
Again, with vectors a
and e
: {3,1,2}^{1,3,1} = {3,1}
$a$f
[1] 2
And lastly, with vectors a
and f
: {3,1,2}^{2,2,2} = {2}
The other items in result
follow.
To extend this to your dataset, think of your data-frame columns as localities, and the transposed-data-frame columns as your species. Then use lapply()
, as shown above.
To break down the nested lapply()
statement, start with the inner lapply()
:
lapply(as.data.frame(t(df)), function(y) ... )
What this means is that every column vector in t(df)
— the columns $d, $e and $f — are represented by the variable y
in function(y)
. We'll come back to ...
in a second.
Now let's look at the outer lapply()
:
lapply(df, function(x) ... )
What this means is that every column vector in df
— columns $a, $b and $c — are represented by variable x
in function(x)
.
Now let's explain ...
.
The outer ...
is any function of x
— this can be length()
, sum()
, etc. and even another lapply()
. The inner lapply()
has its own function and variable name y
, and so the inner ...
can run a function on both x
and y
.
So that's what we do: For every column vector in df
, we run a function on that df
-vector and every column vector in the transpose t(df)
. In our example, the function we will run on x
and y
is intersect()
:
> result <- lapply(df, function(x) lapply(as.data.frame(t(df)), function(y) intersect(x,y)))
Upvotes: 5
Reputation: 121167
Here's a wild guess at your data:
A <- data.frame(
London = c(TRUE, TRUE, FALSE),
Manchester = c(FALSE, TRUE, FALSE),
Birmingham = c(TRUE, FALSE, TRUE),
row.names = c("rats", "mice", "foxes")
)
B <- data.frame(
London = c(TRUE, FALSE, FALSE),
Manchester = c(TRUE, TRUE, TRUE),
Birmingham = c(TRUE, TRUE, FALSE),
row.names = c("rats", "mice", "foxes")
)
> A
London Manchester Birmingham
rats TRUE FALSE TRUE
mice TRUE TRUE FALSE
foxes FALSE FALSE TRUE
> B
London Manchester Birmingham
rats TRUE TRUE TRUE
mice FALSE TRUE TRUE
foxes FALSE TRUE FALSE
In this case, to find species that exist in the same location in both datasets, you just need
as.matrix(A) & as.matrix(B)
Upvotes: 1