user3483060
user3483060

Reputation: 407

Extraction of characters and symbols using R

I have a column with these kind of values

                    id     count  total SEXO EDAD IDENTIF_AFILIADO
 1:  952815090_12_06_Q643     4 133.34    M   39        952815090
 2:   952443257_10_17_C64     9  64.32    F    5        952443257
 3:   931131767_9_10_C716     2  21.88    M    1        931131767
 4:   931131767_8_13_C716    15 173.70    M    1        931131767
 5:   931131767_1_09_C716     1  10.94    M    0        931131767

.....

The id column has a code after the third " _ ". For instance, the first row has "952815090_12_06_Q643"

I need to extrac the code Q643.

More specifically the group of characters after the third "_" in every row. How to perform it using R?

Upvotes: 0

Views: 39

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388862

You can delete everything until last underscore.

sub('.*_', '', df$id)
#[1] "Q643" "C64"  "C716" "C716" "C716"

data

df <- structure(list(id = c("952815090_12_06_Q643", "952443257_10_17_C64", 
"931131767_9_10_C716", "931131767_8_13_C716", "931131767_1_09_C716"
), count = c(4L, 9L, 2L, 15L, 1L), total = c(133.34, 64.32, 21.88, 
173.7, 10.94), SEXO = c("M", "F", "M", "M", "M"), EDAD = c(39L, 
5L, 1L, 1L, 0L), IDENTIF_AFILIADO = c(952815090L, 952443257L, 
931131767L, 931131767L, 931131767L)), 
class = "data.frame", row.names = c(NA, -5L))

Upvotes: 0

Sirius
Sirius

Reputation: 5429

This should do it:


your.ids <- sapply( dat$id, function(id) {
    strsplit( id, "_" )[[1]][4]
})

Or if this is a data.table, perhaps something like this:


dat[, idstring := tstrsplit( id, "_", fixed=T )[4] ]

Applied to your code it looks like this:

dat <- read.table(text=
"                    id     count  total SEXO EDAD IDENTIF_AFILIADO
 1:  952815090_12_06_Q643     4 133.34    M   39        952815090
 2:   952443257_10_17_C64     9  64.32    F    5        952443257
 3:   931131767_9_10_C716     2  21.88    M    1        931131767
 4:   931131767_8_13_C716    15 173.70    M    1        931131767
 5:   931131767_1_09_C716     1  10.94    M    0        931131767
") %>% as.data.table

dat[, idstring := tstrsplit( id, "_", fixed=T )[4] ]

print( dat )

Output:

                     id count  total SEXO EDAD IDENTIF_AFILIADO idstring
1: 952815090_12_06_Q643     4 133.34    M   39        952815090     Q643
2:  952443257_10_17_C64     9  64.32    F    5        952443257      C64
3:  931131767_9_10_C716     2  21.88    M    1        931131767     C716
4:  931131767_8_13_C716    15 173.70    M    1        931131767     C716
5:  931131767_1_09_C716     1  10.94    M    0        931131767     C716

Upvotes: 0

Brigadeiro
Brigadeiro

Reputation: 2945

Using regular expressions:

gsub("^.*_.*_.*_(.*)$", "\\1", id)

Upvotes: 2

Related Questions