Reputation: 301
I have a data frame like this:
wpt ID Fuel Dist Express Local
1 S36 12 1 1 0
2 S36 14 2 1 0
inter S36 NA NA 1 0
inter S36 NA NA 1 0
3 S36 16 4 1 0
inter S36 NA NA 0 1
4 S36 18 6 0 1
5 S36 22 7 0 1
6 W09 45 9 1 0
inter W09 NA NA 1 0
inter W09 NA NA 1 0
inter W09 NA NA 1 0
7 W09 48 14 0 1
8 W09 50 15 0 1
(1)I'd like to interpolate and insert values to the places with NAs of Fuel and Dist column. I treat the rows with "inter" together with it start and end rows of regular numbered "wpt" as a unit. Then conduct interpolation.
The expected output is like this:
wpt ID Fuel Dist Express Local
1 S36 12 1 1 0
2 S36 14 2 1 0
inter S36 14.6667 2.67 1 0
inter S36 15.3333 3.33 1 0
3 S36 16 4 1 0
inter S36 17 5 0 1
4 S36 18 6 0 1
5 S36 22 7 0 1
6 W09 45 9 1 0
inter W09 45.75 10.25 1 0
inter W09 46.50 11.50 1 0
inter W09 47.25 12.75 1 0
7 W09 48 14 0 1
8 W09 50 15 0 1
To be clear, the first segment interpolation is calculated like this:
> seq(14,16,length.out = 4)
[1] 14.00000 14.66667 15.33333 16.00000
(2) Then I want to get cumulative sum of each class of Express and Local by ID. The expected output is like this:
ID Cumsum.Fuel Cumsum.Dist Express Local
S36 4 3 1 0
S36 5 2 0 1
W09 2.25 3.75 1 0
W09 2 1 0 1
To be clear, Cum.sum.Fuel for "S36" for Express is 16-12=4. The same applies to others.
Thanks in advance!!!
Upvotes: 1
Views: 468
Reputation: 887951
To fill both the columns, we can use mutate_at
after grouping by "ID"
library(dplyr)
library(zoo)
df2 <- df1 %>%
group_by(ID) %>%
mutate_at(vars(Fuel, Dist), na.approx)
df2
# wpt ID Fuel Dist Express Local
# <chr> <chr> <dbl> <dbl> <int> <int>
#1 1 S36 12.00000 1.000000 1 0
#2 2 S36 14.00000 2.000000 1 0
#3 inter S36 14.66667 2.666667 1 0
#4 inter S36 15.33333 3.333333 1 0
#5 3 S36 16.00000 4.000000 1 0
#6 inter S36 17.00000 5.000000 0 1
#7 4 S36 18.00000 6.000000 0 1
#8 5 S36 22.00000 7.000000 0 1
#9 6 W09 45.00000 9.000000 1 0
#10 inter W09 45.75000 10.250000 1 0
#11 inter W09 46.50000 11.500000 1 0
#12 inter W09 47.25000 12.750000 1 0
#13 7 W09 48.00000 14.000000 0 1
#14 8 W09 50.00000 15.000000 0 1
For the second part,
library(data.table)
df2 %>%
group_by(ID, Express1 = rleid(Express), Local1 = rleid(Local)) %>%
summarise(Express = first(Express),
Local = first(Local),
Cumsum.Fuel = last(Fuel) - first(Fuel),
Cumsum.Dist = last(Dist) - first(Dist)) %>%
ungroup() %>%
select(-Express1, - Local1)
#Source: local data frame [4 x 5]
# ID Express Local Cumsum.Fuel Cumsum.Dist
# <chr> <int> <int> <dbl> <dbl>
#1 S36 1 0 4.00 3.00
#2 S36 0 1 5.00 2.00
#3 W09 1 0 2.25 3.75
#4 W09 0 1 2.00 1.00
Or we can do this without the rleid
df2 %>%
group_by(ID, Express, Local) %>%
summarise(Cumsum.Fuel = last(Fuel) - first(Fuel),
Cumsum.Dist = last(Dist) - first(Dist))
Upvotes: 1
Reputation: 9628
For the first task you can use:
library(zoo)
na.approx(df$Fuel)
[1] 12.00000 14.00000 14.66667 15.33333 16.00000 17.00000 18.00000 22.00000 45.00000 45.75000
[11] 46.50000 47.25000 48.00000 50.00000
Upvotes: 3