Reputation: 15
I must be asking the question terribly because I can't find what I looking for!
I have a large excel file that looks like this for every day of the month:
Date | Well1 |
---|---|
1/1/16 | 10 |
1/2/16 | NA |
1/3/16 | NA |
1/4/16 | NA |
1/5/16 | 20 |
1/6/16 | NA |
1/7/16 | 25 |
1/8/16 | NA |
1/9/16 | NA |
1/10/16 | 35 |
etc | NA |
I want to make a new column that has the difference between the non-zero rows and divide that by the number of rows between each non zero row. Aiming for something like this:
Date | Well1 | Adjusted |
---|---|---|
1/1/16 | 10 | =(20-10)/4 = 2.5 |
1/2/16 | NA | 1.25 |
1/3/16 | NA | 1.25 |
1/4/16 | NA | 1.25 |
1/5/16 | 20 | =(25-20)/2= 2.5 |
1/6/16 | NA | 2.5 |
1/7/16 | 25 | =(35-25)/3 = 3.3 |
1/8/16 | NA | 3.3 |
1/9/16 | NA | 3.3 |
1/10/16 | 35 | etc |
etc | NA | etc |
I'm thinking I should use lead
or lag
, but the thing is that the steps are different between each nonzero row (so I'm not sure how to use n in the lead/lag function). I've used group_by
so that each month stands alone, as well as attempted case_when
and ifelse
Mostly need ideas on translating excel format into a workable R format.
Upvotes: 1
Views: 268
Reputation: 5798
Base R in the same vein as @thelatemail but with transformations all in one expression:
nas <- is.na(dat$Well1)
res <- within(dat, {
Date <- as.Date(Date, "%m/%d/%y")
Adjusted <- (diff(Well1[!nas]) /
as.numeric(diff(Date[!nas]), units = "days"))[cumsum(!nas)]
}
)
Data:
dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)
Upvotes: 2
Reputation: 887671
Maybe this should work
library(dplyr)
df1 %>%
#// remove the rows with NA
na.omit %>%
# // create a new column with the lead values of Well1
transmute(Date, Well2 = lead(Well1)) %>%
# // join with original data
right_join(df1 %>%
mutate(rn = row_number())) %>%
# // order by the original order
arrange(rn) %>%
# // create a grouping column based on the NA values
group_by(grp = cumsum(!is.na(Well1))) %>%
# // subtract the first element of Well2 with Well1 and divide
# // by number of rows - n() in the group
mutate(Adjusted = (first(Well2) - first(Well1))/n()) %>%
ungroup %>%
select(-grp, - Well2)
Upvotes: 1
Reputation: 93938
With some diff
-ing and repeating of values, you should be able to get there.
dat$Date <- as.Date(dat$Date, format="%m/%d/%y")
nas <- is.na(dat$Well1)
dat$adj <- with(dat[!nas,],
diff(Well1) / as.numeric(diff(Date), units="days")
)[cumsum(!nas)]
# Date Well1 adj
#1 2016-01-01 10 2.5
#2 2016-01-02 NA 2.5
#3 2016-01-03 NA 2.5
#4 2016-01-04 NA 2.5
#5 2016-01-05 20 2.5
#6 2016-01-06 NA 2.5
#7 2016-01-07 25 5.0
#8 2016-01-08 NA 5.0
#9 2016-01-09 NA 5.0
#10 2016-01-10 40 NA
dat
being used is:
dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)
Upvotes: 3