Reputation: 309
I have a dataframe with car repair data. It also tells me if a car broke down (and got repaired). I would like to get rid of all rows with repair events after the car broke down.
Car <- c('A','A','B')
Damage <-c("Wheels","Motor","Motor")
date <-c('01-01-2015','01-01-2016','01-01-2016')
Broke_down <-c(1,0,1)
df <- as.data.frame(cbind(Car,date,Damage,Broke_down))
Basically, I want to remove all lines that occur for a car after the dummy was 1.
So in this case the output would have to be:
"Car" "date" "Damage" "Broke_down"
A 01-01-2015 Wheels 1
B 01-01-2016 Motor 1
Best, Felix
Upvotes: 1
Views: 74
Reputation: 2535
Here's a solution using split
and lapply
, data preparation is the same as in the question:
df2 <- do.call(
rbind,
lapply(
split(df, df$Car),
function(x){
x[1:which.min(x$Broke_down==1), ]
})
)
Explanation:
split
gives a list of data.frames
lapply
applies the function in it's second argument and returns a list of the results
finally do.call
calls rbind
with the resulting list of data.frames
giving you one long data.frame
again.
There are similar but faster solutions using data.table
and dplyr
.
Upvotes: 2
Reputation: 887118
Based on the update in OP's post
library(data.table)
setDT(df)[, .SD[cummin(Broke_down) > 0], Car]
# Car date Damage Broke_down
#1: A 01-01-2015 Wheels 1
#2: B 01-01-2016 Motor 1
Or with ave
from base R
df[with(df, ave(Broke_down, Car, FUN = cummin) > 0),]
# Car date Damage Broke_down
#1 A 01-01-2015 Wheels 1
#3 B 01-01-2016 Motor 1
Upvotes: 1
Reputation: 7023
There might be a more elegant way, but lapply
and do.call
do the trick:
df_out <- do.call(rbind,lapply(unique(df$Car),function(x){
df_sub <- df[df$Car==x,]
df_sub[1:which(df_sub$Broke_down == 1)[1],]
}))
> df_out
Car date Damage Broke_down
1 A 01-01-2015 Wheels 1
3 B 01-01-2016 Motor 1
Upvotes: 0