Fumia
Fumia

Reputation: 35

Sum certain rows given 2 constraints in R

I am trying to write an conditional statement with the following constraints. Below is an example data frame showing the problem I am running into.

Row <- c(1,2,3,4,5,6,7)
La <- c(51.25,51.25,51.75,53.25,53.25,54.25,54.25)
Lo <- c(128.25,127.75,127.25,119.75,119.25,118.75,118.25)
Y <- c(5,10,2,4,5,7,9)
Cl <- c("EF","EF","EF","EF","NA","NA","CE")     

d <- data.frame(Row,La,Lo,Y,Cl)

  Row    La     Lo  Y Cl
1   1 51.25 128.25  5 EF
2   2 51.25 127.75 10 EF
3   3 51.75 127.25  2 EF
4   4 53.25 119.75  4 EF
5   5 53.25 119.25  5 NA
6   6 54.25 118.75  7 NA
7   7 55.25 118.25  9 CE

I would like to sum column "Y" (removing all values from that row) if "Cl" is NA with the corresponding "Lo" and "La" values that are close (equal to or less than 1.00). In effect, I want to remove NA from being in the data frame without losing the value of "Y", but instead adding this value to its closest neighbor.

I would like the return data frame to look like this:

Row2 <- c(1,2,3,4,7)
La2 <- c(51.25,51.25,51.75,53.25,55.25)
Lo2 <- c(128.25,127.75,127.25,119.75,118.25)
Y2 <- c(5,10,2,9,16)
Cl2 <- c("EF","EF","EF","EF","CE")     

d2 <- data.frame(Row2,La2,Lo2,Y2,Cl2)

 Row2   La2    Lo2 Y2 Cl2
1    1 51.25 128.25  5  EF
2    2 51.25 127.75 10  EF
3    3 51.75 127.25  2  EF
4    4 53.25 119.75  9  EF
5    7 55.25 118.25 16  CE

recent edit: If NA row is close to one row in terms of Lo value and same closeness to another row in La value, join by La value. If there are 2 equally close rows of Lo and La values, join by smaller La value.

Thank you for the help!

Upvotes: 1

Views: 50

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28685

Here is a method to use if you can make some distance matrix m for the distance between all the (La, Lo) rows in your data. I use the output of dist, which is euclidean distance. The row with the lowest distance is selected, or the earliest such row if the lowest distance is shared by > 1 row.

w <- which(is.na(d$Cl))
m <- as.matrix(dist(d[c('La', 'Lo')]))
m[row(m) %in% w] <- NA
d$g <- replace(seq(nrow(d)), w, apply(m[,w], 2, which.min))

library(dplyr)

d %>% 
  group_by(g) %>% 
  summarise(La = La[!is.na(Cl)],
            Lo = Lo[!is.na(Cl)],
            Y = sum(Y), 
            Cl = Cl[!is.na(Cl)]) %>% 
  select(-g)

# # A tibble: 5 x 4
#      La    Lo     Y Cl   
#   <dbl> <dbl> <dbl> <fct>
# 1  51.2  128.     5 EF   
# 2  51.2  128.    10 EF   
# 3  51.8  127.     2 EF   
# 4  53.2  120.     9 EF   
# 5  54.2  118.    16 CE   

Upvotes: 3

Related Questions