lennymckenny
lennymckenny

Reputation: 15

Reference the previous non-zero row, find the difference and divide by nrows

I must be asking the question terribly because I can't find what I looking for!

I have a large excel file that looks like this for every day of the month:

Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 35
etc NA

I want to make a new column that has the difference between the non-zero rows and divide that by the number of rows between each non zero row. Aiming for something like this:

Date Well1 Adjusted
1/1/16 10 =(20-10)/4 = 2.5
1/2/16 NA 1.25
1/3/16 NA 1.25
1/4/16 NA 1.25
1/5/16 20 =(25-20)/2= 2.5
1/6/16 NA 2.5
1/7/16 25 =(35-25)/3 = 3.3
1/8/16 NA 3.3
1/9/16 NA 3.3
1/10/16 35 etc
etc NA etc

I'm thinking I should use lead or lag, but the thing is that the steps are different between each nonzero row (so I'm not sure how to use n in the lead/lag function). I've used group_by so that each month stands alone, as well as attempted case_when and ifelse Mostly need ideas on translating excel format into a workable R format.

Upvotes: 1

Views: 268

Answers (3)

hello_friend
hello_friend

Reputation: 5798

Base R in the same vein as @thelatemail but with transformations all in one expression:

nas <- is.na(dat$Well1)
res <- within(dat, {
    Date <- as.Date(Date, "%m/%d/%y")
    Adjusted <- (diff(Well1[!nas]) / 
      as.numeric(diff(Date[!nas]), units = "days"))[cumsum(!nas)]
  }
)

Data:

dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)

Upvotes: 2

akrun
akrun

Reputation: 887671

Maybe this should work

library(dplyr)
df1 %>% 
  #// remove the rows with NA
  na.omit %>%
  # // create a new column with the lead values of Well1
  transmute(Date, Well2 = lead(Well1)) %>% 
  # // join with original data
  right_join(df1 %>%
              mutate(rn = row_number())) %>%
  # // order by the original order 
  arrange(rn) %>%
  # // create a grouping column based on the NA values
  group_by(grp = cumsum(!is.na(Well1))) %>%
  # // subtract the first element of Well2 with Well1 and divide
  # // by number of rows - n() in the group
  mutate(Adjusted = (first(Well2) - first(Well1))/n()) %>% 
  ungroup %>% 
  select(-grp, - Well2)

Upvotes: 1

thelatemail
thelatemail

Reputation: 93938

With some diff-ing and repeating of values, you should be able to get there.

dat$Date <- as.Date(dat$Date, format="%m/%d/%y")
nas <- is.na(dat$Well1)
dat$adj <- with(dat[!nas,],
              diff(Well1) / as.numeric(diff(Date), units="days")
           )[cumsum(!nas)]

#         Date Well1 adj
#1  2016-01-01    10 2.5
#2  2016-01-02    NA 2.5
#3  2016-01-03    NA 2.5
#4  2016-01-04    NA 2.5
#5  2016-01-05    20 2.5
#6  2016-01-06    NA 2.5
#7  2016-01-07    25 5.0
#8  2016-01-08    NA 5.0
#9  2016-01-09    NA 5.0
#10 2016-01-10    40  NA

dat being used is:

dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)

Upvotes: 3

Related Questions