flightless13wings
flightless13wings

Reputation: 109

Pattern Matching to create a new column in R

I have a data table like the following:

library(data.table)
DF <- as.data.table(list(ID = c(125534,"122-343",312343,"12343-343FGV", 1234, 713827), Product = c('Y', NA, NA, 'Z', NA, NA), Type = c(NA, 'D', 'G', NA, NA, NA)))

            ID Product Type 
1:       125534       Y   NA       
2:      122-343      NA    D      
3:       312343      NA    G      
4: 12343-343FGV       Z   NA     
5:         1234      NA   NA     
6:       713827      NA   NA      

I would like to create a new column named CATEGORY based on how ID is categorized. My incorrect code looks like the following:

DF$CATEGORY <- ifelse(grepl("^12[0-9]|^31[0-9]|", DF$ID), 'IN', 'OUT')

Desired result:

            ID Product Type CATEGORY
1:       125534       Y   NA       IN
2:      122-343      NA    D      OUT
3:       312343      NA    G       IN
4: 12343-343FGV       Z   NA      OUT
5:         1234      NA   NA      OUT
6:       713827      NA   NA      OUT

I wanted to code it so that any ID with alphabetical letters or symbols, shorter than 6 characters and do not begin with either 12 or 31 are out. The remainder are in.

Upvotes: 1

Views: 564

Answers (2)

akrun
akrun

Reputation: 886948

We can also do this by creating the 'CATEGORY' column with "OUT" values and then specify the 'i' with logical index that only matches the criteria for "IN" cases and assign (:=) the 'CATEGORY' to "IN"

DF[, CATEGORY := "OUT"][grepl("^(12|31)[0-9]{4,}$", ID), CATEGORY := "IN"]
DF
#             ID Product Type CATEGORY
#1:       125534       Y   NA       IN
#2:      122-343      NA    D      OUT
#3:       312343      NA    G       IN
#4: 12343-343FGV       Z   NA      OUT
#5:         1234      NA   NA      OUT
#6:       713827      NA   NA      OUT

Upvotes: 1

ddunn801
ddunn801

Reputation: 1890

I think you mean this:

DF[, CATEGORY := ifelse(grepl("[^0-9]", ID) | 
                          nchar(ID) < 6 | 
                          !grepl("^12|^31", ID), 
                        "OUT", "IN")]

Upvotes: 2

Related Questions