Nihat
Nihat

Reputation: 59

Conditional mutating of the R data frame based on the strings

I am using R and trying to create a new column based on the string information from the existing columns.

My data is like:

risk_code          |  area
-----------------------------------
DEEP DIGGING ALL   |  --
CONSTRUCTION PRO   |  Construction
CLAIMS ONSHORE     |  --
OFFSHORE CLAIMS    |  --

And the result I need is:

risk_code          |  area          |  area_new
-------------------------------------------------
DEEP DIGGING ALL   |  --            |  Digging
CONSTRUCTION PRO   |  Construction  |  Construction
CLAIMS ONSHORE     |  --            |  Onshore
OFFSHORE CLAIMS    |  --            |  Offshore

I understanding that I make several mistakes in the code, but after the whole week of staring on it and internet searching, I cannot get the result I need. I appreciate your help. Thanks in advance.

Occupancy <- read_excel("Occupancy.xlsx")

OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)

OccupancyMutated$area_new[Occupancy$area == "--"] <- 
{ 
  if (OccupancyMutated$risk_code == %Digging%) {"Digging"}
else if (OccupancyMutated$risk_code == %ONSHORE%) {"Onshore"}
else if (OccupancyMutated$risk_code == %OFFSHORE%) {"Offshore"}
  else {"empty"}
}
View(OccupancyMutated)

Upvotes: 1

Views: 79

Answers (2)

Nihat
Nihat

Reputation: 59

So, this is the answer (thanks to Sotos):

Occupancy <- read_excel("Occupancy.xlsx")

OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)

OccupancyMutated$area_new[Occupancy$area == "--"] <- 
str_to_title(str_extract(tolower(Occupancy$risk_code), 'Extraction|Offshore|Onshore'))

View(OccupancyMutated)

Upvotes: 1

Sotos
Sotos

Reputation: 51582

We can use stringr for this operation. The function word will extract the first word of each string in risk_code and the function str_to_title will convert to your required format. Both functions are vectorized so simply,

library(stringr)

str_to_title(word(df$risk_code, 1, 1))
#[1] "Digging"      "Construction" "Onshore"      "Offshore" 

If it is not always the first word and you need to do it for specific words only, you can do,

str_to_title(str_extract(tolower(df$risk_code), 'digging|offshore|onshore'))
#[1] "Digging"  NA         "Onshore"  "Offshore" 

Upvotes: 1

Related Questions