Long_NgV
Long_NgV

Reputation: 475

Condition with multiple strings in columns R

I have this dataframe:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')

names(df) <- c('ID', 'Jobs')

I want to group languages in some categories, If each job description contains the "Software", "Data" or "Computer", then the category for this job is "IT", if not the category would be "OTH". The result should look like this:

 ID               Jobs  Category
  1  Software Engineer        IT
  2      Data Engineer        IT
  3         HR Officer       OTH
  4  Marketing Manager       OTH
  5  Computer Engineer        IT

In Python I can use these code df["Jobs"].str.contains("Software|Data|Computer", na = False) combines with np.select to get the Category. However I don't know how to do it in R, please give me some advice to solve this problem.

Upvotes: 1

Views: 944

Answers (2)

Farzad Minooei
Farzad Minooei

Reputation: 51

Here is my solution:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')
df <- data.frame(a,b)
names(df) <- c('ID', 'Jobs')
df

  ID              Jobs
1  1 Software Engineer
2  2     Data Engineer
3  3        HR Officer
4  4 Marketing Manager
5  5 Computer Engineer

#Add Job Category

df$Category [ grep("Software|Data|Computer", df$Jobs)] <- "IT"
df$Category [is.na(df$Category)] <- "OTH"
df

  ID              Jobs Category
1  1 Software Engineer       IT
2  2     Data Engineer       IT
3  3        HR Officer      OTH
4  4 Marketing Manager      OTH
5  5 Computer Engineer       IT

Upvotes: 1

akrun
akrun

Reputation: 886938

We can use grepl to get a logical vector by matching either the 'Software', 'Data', or 'Computer' in the 'Jobs' column, convert it to numeric index and based on that replace the values with 'OTH' or 'IT'

df$Category <- c("OTH", "IT")[(grepl("Software|Data|Computer", df$Jobs) + 1)]
df$Category
#[1] "IT"  "IT"  "OTH" "OTH" "IT"

Or use ifelse with grepl

ifelse(grepl("Software|Data|Computer", df$Jobs), "IT", "OTH")

data

df <- structure(list(ID = c(1, 2, 3, 4, 5), Jobs = structure(c(5L, 
2L, 3L, 4L, 1L), .Label = c("Computer Engineer", "Data Engineer", 
"HR Officer", "Marketing Manager", "Software Engineer"), 
class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 1

Related Questions