Reputation: 1
Hope this email finds you safe and healthy!
I'm having a frustrating problem collecting the output of a simple loop that I can't crack.
Here is the loop, which calculates the same summary statistic many times using different upper limits of my data. This gives me correct values and prints them with no issues.
> for (i in maxdist)
+ {
+ homo_sax <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="saxatilis_saxatilis") ## count all sax_sax pairs
+ homo_arc <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="arcana_arcana") ## count all the arc_arc pairs
+ hetero <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="arcana_saxatilis") ## count all the arc_sax pairs
+ total_homo = homo_sax + homo_arc ## calculate the total number of homo observations
+ temp_RI <- 1-2*hetero/(total_homo+hetero) ### calculate RI according to equation RI4 from Sobel & Chen (2013)
+ print(temp_RI)
+ }
[1] 0.2046285
[1] 0.1603105
[1] 0.1195596
[1] 0.01857161
[1] 0.01784158
[1] 0.01498829
>
The problem arises when I try and save these values to an empty vector, as I obtain numbers that are different, and are not correct:
> maxdist <- seq(from = 0.5, to = 3, by = 0.5) ## the max distance for each bin
RI_bins <- vector("numeric",length(maxdist))
for (i in maxdist)
{
homo_sax <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="saxatilis_saxatilis") ## collect all sax_sax pairs
homo_arc <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="arcana_arcana") ## collect all the arc_arc pairs
hetero <- sum(W_distances$distance > 0 & W_distances$distance < i & W_distances$concat_ID=="arcana_saxatilis")
total_homo = homo_sax + homo_arc
temp_RI <- 1-2*hetero/(total_homo+hetero) ### equation RI4 from Sobel & Chen (2013)
RI_bins[i]<-temp_RI
}
> RI_bins
[1] 0.11955961 0.01784158 0.01498829 0.00000000 0.00000000 0.00000000
I'd be very grateful if someone could help me to understand what I am missing here. Thanks in advance!!
Sean
Upvotes: 0
Views: 26
Reputation: 160447
maxdist
is a floating point number, not to be used as the index in a vector. By trying that, the fractional number is truncated and inserted there.
vec <- numeric(5)
vec
# [1] 0 0 0 0 0
vec[2.5] <- 3
vec
# [1] 0 3 0 0 0
vec[2.1] <- 2
vec
# [1] 0 2 0 0 0
vec[2.9] <- 1
vec
# [1] 0 1 0 0 0
Instead, try this code. The two changes are: using seq_along
to count along the length of maxdist
, and then assign to and use md
for comparison using this index.
maxdist <- seq(from = 0.5, to = 3, by = 0.5) ## the max distance for each bin
RI_bins <- vector("numeric",length(maxdist))
for (i in seq_along(maxdist)) {
md <- maxdist[i]
homo_sax <- sum(W_distances$distance > 0 & W_distances$distance < md & W_distances$concat_ID=="saxatilis_saxatilis") ## collect all sax_sax pairs
homo_arc <- sum(W_distances$distance > 0 & W_distances$distance < md & W_distances$concat_ID=="arcana_arcana") ## collect all the arc_arc pairs
hetero <- sum(W_distances$distance > 0 & W_distances$distance < md & W_distances$concat_ID=="arcana_saxatilis")
total_homo = homo_sax + homo_arc
temp_RI <- 1-2*hetero/(total_homo+hetero) ### equation RI4 from Sobel & Chen (2013)
RI_bins[i]<-temp_RI
}
(Untested, as we don't have your data.)
Upvotes: 1