Reputation: 475
I have this dataframe:
a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')
names(df) <- c('ID', 'Jobs')
I want to group languages in some categories, If each job description contains the "Software", "Data" or "Computer", then the category for this job is "IT", if not the category would be "OTH". The result should look like this:
ID Jobs Category
1 Software Engineer IT
2 Data Engineer IT
3 HR Officer OTH
4 Marketing Manager OTH
5 Computer Engineer IT
In Python I can use these code df["Jobs"].str.contains("Software|Data|Computer", na = False)
combines with np.select
to get the Category. However I don't know how to do it in R, please give me some advice to solve this problem.
Upvotes: 1
Views: 944
Reputation: 51
Here is my solution:
a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')
df <- data.frame(a,b)
names(df) <- c('ID', 'Jobs')
df
ID Jobs
1 1 Software Engineer
2 2 Data Engineer
3 3 HR Officer
4 4 Marketing Manager
5 5 Computer Engineer
#Add Job Category
df$Category [ grep("Software|Data|Computer", df$Jobs)] <- "IT"
df$Category [is.na(df$Category)] <- "OTH"
df
ID Jobs Category
1 1 Software Engineer IT
2 2 Data Engineer IT
3 3 HR Officer OTH
4 4 Marketing Manager OTH
5 5 Computer Engineer IT
Upvotes: 1
Reputation: 886938
We can use grepl
to get a logical vector by matching either the 'Software', 'Data', or 'Computer' in the 'Jobs' column, convert it to numeric index and based on that replace the values with 'OTH' or 'IT'
df$Category <- c("OTH", "IT")[(grepl("Software|Data|Computer", df$Jobs) + 1)]
df$Category
#[1] "IT" "IT" "OTH" "OTH" "IT"
Or use ifelse
with grepl
ifelse(grepl("Software|Data|Computer", df$Jobs), "IT", "OTH")
df <- structure(list(ID = c(1, 2, 3, 4, 5), Jobs = structure(c(5L,
2L, 3L, 4L, 1L), .Label = c("Computer Engineer", "Data Engineer",
"HR Officer", "Marketing Manager", "Software Engineer"),
class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1