Reputation:

Replacing a value in data frame column only when it matches two other column values

Suppose the below data is just a part of very large data that i am using .

mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin'))

Date           Expense      Details        Vehicle    Person
1 2015-01-01    1566        item101 xsda   Car        John
2 2015-01-10    5646        fuel asa       Bike       Smith
3 2015-01-27    3456        item102a       Car        Robin
4 2015-02-27    6546        fuel asa       Car        <NA>
5 2015-03-15    5313        fuel sda       Bike       <NA>
6 2015-04-17    6466        fuel           Bike       <NA>
7 2015-04-18    5456        item102a       Bike       Robin

There are two points to consider

1)When a vehicle 'car' was used and 'fuel' was purchased the person was John

2)When a vehicle 'Bike' was used and 'fuel' was purchased then the person was smith

my desired output is

     Date       Expense  Details        Vehicle    Person
 1 2015-01-01    1566    item101 xsda     Car      John
 2 2015-01-10    5646    fuel             Bike     Smith
 3 2015-01-27    3456    item102a         Car      Robin
 4 2015-02-27    6546    fuel             Car      John
 5 2015-03-15    5313    fuel sda         Bike     Smith
 6 2015-04-17    6466    fuel             Bike     Smith
 7 2015-04-18    5456    item102a         Bike     Robin

Please tell me how to solve this problem? I used following steps and reached half way towards the solution

mydf$Details<-as.character(mydf$Details)
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel'

mydf

    Date     Expense      Details        Vehicle    Person
1 2015-01-01    1566      item101 xsda   Car        John
2 2015-01-10    5646      Fuel           Bike       Smith
3 2015-01-27    3456      item102a       Car        Robin
4 2015-02-27    6546      Fuel           Car        <NA>
5 2015-03-15    5313      Fuel           Bike       <NA>
6 2015-04-17    6466      Fuel           Bike       <NA>
7 2015-04-18    5456      item102a       Bike       Robin

Note: if possible please avoid loops. If there are better and faster ways of doing this please share

Upvotes: 0

Answers (2)

cderv

Reputation: 6552

You could do in few lines using data.table:

library(data.table)

setDT(mydf)

mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"]
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"]

mydf
#>          Date Expense      Details Vehicle Person
#> 1: 2015-01-01    1566 item101 xsda     Car   John
#> 2: 2015-01-10    5646     fuel asa    Bike  Smith
#> 3: 2015-01-27    3456     item102a     Car  Robin
#> 4: 2015-02-27    6546     fuel asa     Car   John
#> 5: 2015-03-15    5313     fuel sda    Bike  Smith
#> 6: 2015-04-17    6466         fuel    Bike  Smith
#> 7: 2015-04-18    5456     item102a    Bike  Robin

Using dplyr, you could also do conditional mutate but code is longer. I used stringr package for string manipulation

library(dplyr)
library(stringr)
mydf %>%
  mutate(
    Person = ifelse(
      is.na(Person) & 
        str_detect(Details, "fuel") & 
        Vehicle == "Car", 
      "John",
      ifelse(
        is.na(Person) & 
          str_detect(Details, "fuel") & 
          Vehicle == "Bike", 
        "Smith", 
        as.character(Person)))
  )
#>         Date Expense      Details Vehicle Person
#> 1 2015-01-01    1566 item101 xsda     Car   John
#> 2 2015-01-10    5646     fuel asa    Bike  Smith
#> 3 2015-01-27    3456     item102a     Car  Robin
#> 4 2015-02-27    6546     fuel asa     Car   John
#> 5 2015-03-15    5313     fuel sda    Bike  Smith
#> 6 2015-04-17    6466         fuel    Bike  Smith
#> 7 2015-04-18    5456     item102a    Bike  Robin

Upvotes: 1

Zahiro Mor

Reputation: 1718

you are halfway there as you said try this two lines:

mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car']  <- 'John'
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike']  <- 'Smith'

Upvotes: 1

Replacing a value in data frame column only when it matches two other column values

Answers (2)

Related Questions