Reputation: 407
I have a column with these kind of values
id count total SEXO EDAD IDENTIF_AFILIADO
1: 952815090_12_06_Q643 4 133.34 M 39 952815090
2: 952443257_10_17_C64 9 64.32 F 5 952443257
3: 931131767_9_10_C716 2 21.88 M 1 931131767
4: 931131767_8_13_C716 15 173.70 M 1 931131767
5: 931131767_1_09_C716 1 10.94 M 0 931131767
.....
The id column has a code after the third " _ ". For instance, the first row has "952815090_12_06_Q643"
I need to extrac the code Q643.
More specifically the group of characters after the third "_" in every row. How to perform it using R?
Upvotes: 0
Views: 39
Reputation: 388862
You can delete everything until last underscore.
sub('.*_', '', df$id)
#[1] "Q643" "C64" "C716" "C716" "C716"
data
df <- structure(list(id = c("952815090_12_06_Q643", "952443257_10_17_C64",
"931131767_9_10_C716", "931131767_8_13_C716", "931131767_1_09_C716"
), count = c(4L, 9L, 2L, 15L, 1L), total = c(133.34, 64.32, 21.88,
173.7, 10.94), SEXO = c("M", "F", "M", "M", "M"), EDAD = c(39L,
5L, 1L, 1L, 0L), IDENTIF_AFILIADO = c(952815090L, 952443257L,
931131767L, 931131767L, 931131767L)),
class = "data.frame", row.names = c(NA, -5L))
Upvotes: 0
Reputation: 5429
This should do it:
your.ids <- sapply( dat$id, function(id) {
strsplit( id, "_" )[[1]][4]
})
Or if this is a data.table, perhaps something like this:
dat[, idstring := tstrsplit( id, "_", fixed=T )[4] ]
Applied to your code it looks like this:
dat <- read.table(text=
" id count total SEXO EDAD IDENTIF_AFILIADO
1: 952815090_12_06_Q643 4 133.34 M 39 952815090
2: 952443257_10_17_C64 9 64.32 F 5 952443257
3: 931131767_9_10_C716 2 21.88 M 1 931131767
4: 931131767_8_13_C716 15 173.70 M 1 931131767
5: 931131767_1_09_C716 1 10.94 M 0 931131767
") %>% as.data.table
dat[, idstring := tstrsplit( id, "_", fixed=T )[4] ]
print( dat )
Output:
id count total SEXO EDAD IDENTIF_AFILIADO idstring
1: 952815090_12_06_Q643 4 133.34 M 39 952815090 Q643
2: 952443257_10_17_C64 9 64.32 F 5 952443257 C64
3: 931131767_9_10_C716 2 21.88 M 1 931131767 C716
4: 931131767_8_13_C716 15 173.70 M 1 931131767 C716
5: 931131767_1_09_C716 1 10.94 M 0 931131767 C716
Upvotes: 0