Reputation: 35
I am trying to write an conditional statement with the following constraints. Below is an example data frame showing the problem I am running into.
Row <- c(1,2,3,4,5,6,7)
La <- c(51.25,51.25,51.75,53.25,53.25,54.25,54.25)
Lo <- c(128.25,127.75,127.25,119.75,119.25,118.75,118.25)
Y <- c(5,10,2,4,5,7,9)
Cl <- c("EF","EF","EF","EF","NA","NA","CE")
d <- data.frame(Row,La,Lo,Y,Cl)
Row La Lo Y Cl
1 1 51.25 128.25 5 EF
2 2 51.25 127.75 10 EF
3 3 51.75 127.25 2 EF
4 4 53.25 119.75 4 EF
5 5 53.25 119.25 5 NA
6 6 54.25 118.75 7 NA
7 7 55.25 118.25 9 CE
I would like to sum column "Y" (removing all values from that row) if "Cl" is NA with the corresponding "Lo" and "La" values that are close (equal to or less than 1.00). In effect, I want to remove NA from being in the data frame without losing the value of "Y", but instead adding this value to its closest neighbor.
I would like the return data frame to look like this:
Row2 <- c(1,2,3,4,7)
La2 <- c(51.25,51.25,51.75,53.25,55.25)
Lo2 <- c(128.25,127.75,127.25,119.75,118.25)
Y2 <- c(5,10,2,9,16)
Cl2 <- c("EF","EF","EF","EF","CE")
d2 <- data.frame(Row2,La2,Lo2,Y2,Cl2)
Row2 La2 Lo2 Y2 Cl2
1 1 51.25 128.25 5 EF
2 2 51.25 127.75 10 EF
3 3 51.75 127.25 2 EF
4 4 53.25 119.75 9 EF
5 7 55.25 118.25 16 CE
recent edit: If NA row is close to one row in terms of Lo value and same closeness to another row in La value, join by La value. If there are 2 equally close rows of Lo and La values, join by smaller La value.
Thank you for the help!
Upvotes: 1
Views: 50
Reputation: 28685
Here is a method to use if you can make some distance matrix m
for the distance between all the (La, Lo) rows in your data. I use the output of dist
, which is euclidean distance. The row with the lowest distance is selected, or the earliest such row if the lowest distance is shared by > 1 row.
w <- which(is.na(d$Cl))
m <- as.matrix(dist(d[c('La', 'Lo')]))
m[row(m) %in% w] <- NA
d$g <- replace(seq(nrow(d)), w, apply(m[,w], 2, which.min))
library(dplyr)
d %>%
group_by(g) %>%
summarise(La = La[!is.na(Cl)],
Lo = Lo[!is.na(Cl)],
Y = sum(Y),
Cl = Cl[!is.na(Cl)]) %>%
select(-g)
# # A tibble: 5 x 4
# La Lo Y Cl
# <dbl> <dbl> <dbl> <fct>
# 1 51.2 128. 5 EF
# 2 51.2 128. 10 EF
# 3 51.8 127. 2 EF
# 4 53.2 120. 9 EF
# 5 54.2 118. 16 CE
Upvotes: 3