Reputation:
Suppose the below data is just a part of very large data that i am using .
mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin'))
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 fuel asa Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 fuel asa Car <NA>
5 2015-03-15 5313 fuel sda Bike <NA>
6 2015-04-17 6466 fuel Bike <NA>
7 2015-04-18 5456 item102a Bike Robin
There are two points to consider
1)When a vehicle 'car' was used and 'fuel' was purchased the person was John
2)When a vehicle 'Bike' was used and 'fuel' was purchased then the person was smith
my desired output is
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 fuel Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 fuel Car John
5 2015-03-15 5313 fuel sda Bike Smith
6 2015-04-17 6466 fuel Bike Smith
7 2015-04-18 5456 item102a Bike Robin
Please tell me how to solve this problem? I used following steps and reached half way towards the solution
mydf$Details<-as.character(mydf$Details)
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel'
mydf
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 Fuel Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 Fuel Car <NA>
5 2015-03-15 5313 Fuel Bike <NA>
6 2015-04-17 6466 Fuel Bike <NA>
7 2015-04-18 5456 item102a Bike Robin
Note: if possible please avoid loops. If there are better and faster ways of doing this please share
Upvotes: 0
Views: 43
Reputation: 6552
You could do in few lines using data.table
:
library(data.table)
setDT(mydf)
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"]
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"]
mydf
#> Date Expense Details Vehicle Person
#> 1: 2015-01-01 1566 item101 xsda Car John
#> 2: 2015-01-10 5646 fuel asa Bike Smith
#> 3: 2015-01-27 3456 item102a Car Robin
#> 4: 2015-02-27 6546 fuel asa Car John
#> 5: 2015-03-15 5313 fuel sda Bike Smith
#> 6: 2015-04-17 6466 fuel Bike Smith
#> 7: 2015-04-18 5456 item102a Bike Robin
Using dplyr
, you could also do conditional mutate but code is longer. I used stringr
package for string manipulation
library(dplyr)
library(stringr)
mydf %>%
mutate(
Person = ifelse(
is.na(Person) &
str_detect(Details, "fuel") &
Vehicle == "Car",
"John",
ifelse(
is.na(Person) &
str_detect(Details, "fuel") &
Vehicle == "Bike",
"Smith",
as.character(Person)))
)
#> Date Expense Details Vehicle Person
#> 1 2015-01-01 1566 item101 xsda Car John
#> 2 2015-01-10 5646 fuel asa Bike Smith
#> 3 2015-01-27 3456 item102a Car Robin
#> 4 2015-02-27 6546 fuel asa Car John
#> 5 2015-03-15 5313 fuel sda Bike Smith
#> 6 2015-04-17 6466 fuel Bike Smith
#> 7 2015-04-18 5456 item102a Bike Robin
Upvotes: 1
Reputation: 1718
you are halfway there as you said try this two lines:
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car'] <- 'John'
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike'] <- 'Smith'
Upvotes: 1