Reputation: 321
I am handling a large data set which stores a 9-digit ID in two columns, say ID_part_1
and ID_part_2
ID part 1 is a common identifier for top-level specification which are duplicated throughout this column and ID part 2 is unique for each ID part 1. I want to combine part 1 with part 2 and then cut off the last character or integer of the generated strings.
See the example data below:
ID_part_1 ID_part_2 Comb_ID
G12345 678 G1234567
G12345 679 G1234567
A23567 9C1 A235679C
123456 789 12345678
All data is stored in a data.table, say my_data.dt
, so the columns can be addressed easily. Both columns ID_part_1
and ID_part_2
are of type "character". The computed results should be stored in column Comb_ID. As trimming the last character off the combined string, I will then subsequently extract all unique values from the computed column as:
unique(my_data.dt[, Comb_ID])
Upvotes: 1
Views: 60
Reputation: 887128
We can use substr
with paste
in base R
my_data.dt$Comb_ID <- with(my_data.dt,
paste0(ID_part_1, substr(ID_part_2, 1, 2)))
my_data.dt$Comb_ID
#[1] "G1234567" "G1234567" "A235679C" "12345678"
NOTE: No packages are needed
my_data.dt <- structure(list(ID_part_1 = c("G12345", "G12345", "A23567", "123456"
), ID_part_2 = c("678", "679", "9C1", "789"), Comb_ID = c("G1234567",
"G1234567", "A235679C", "12345678")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1
Reputation: 1972
An option based in the tidyverse.
library(dplyr)
library(stringr)
library(purrr)
data %>%
mutate(Comb_ID = map2_chr(ID_part_1, ID_part_2, ~ str_c(.x, .y, collapse = '')),
Comb_ID = str_sub(Comb_ID, 1, -2))
# ID_part_1 ID_part_2 Comb_ID
# 1: G12345 678 G1234567
# 2: G12345 679 G1234567
# 3: A23567 9C1 A235679C
# 4: 123456 789 12345678
Data
data <- structure(list(ID_part_1 = c("G12345", "G12345", "A23567", "123456"
), ID_part_2 = c("678", "679", "9C1", "789")), row.names = c(NA,
-4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x55dd6c5238e0>)
Upvotes: 1