Adam Majdi
Adam Majdi

Reputation: 107

Insert a space at a specific location in a string

I have a data frame and want to insert a space at a specific location. Here is an example of the data:

0MHOCAN000006026421HOCAN000000392457HOCAN000005311227
0FHOUSA000002272874HOUSA000002272874HOUSA000050206641
0MHOUSA000002272874HOUSA000002076121HOUSA000014569699

And here is what I want to get (a space before any letter H):

0M  HOCAN000006026421  HOCAN000000392457  HOCAN000005311227
0F  HOUSA000002272874  HOUSA000002272874  HOUSA000050206641
0M  HOUSA000002272874  HOUSA000002076121  HOUSA000014569699 

Upvotes: 1

Views: 1407

Answers (2)

zx8754
zx8754

Reputation: 56269

We can use fixed width read:

Base function read.fwf:

x1 <- read.fwf("temp.txt",
               widths = c(2, 17, 17, 17),
               col.names = paste0("myColName",1:4),
               stringsAsFactors = FALSE)
# check output
str(x1)
# 'data.frame': 3 obs. of  4 variables:
# $ myColName1: chr  "0M" "0F" "0M"
# $ myColName2: chr  "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874"
# $ myColName3: chr  "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121"
# $ myColName4: chr  "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699"
x1
#   myColName1        myColName2        myColName3        myColName4
# 1         0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227
# 2         0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641
# 3         0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699

Using read_fwf from readr package:

library(readr)

x2 <- read_fwf("temp.txt",
               fwf_widths(c(2, 17, 17, 17),
                          col_names = paste0("myColName",1:4)))
# check output
str(x2)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of  4 variables:
#   $ myColName1: chr  "0M" "0F" "0M"
# $ myColName2: chr  "HOCAN000006026421" "HOUSA000002272874" "HOUSA000002272874"
# $ myColName3: chr  "HOCAN000000392457" "HOUSA000002272874" "HOUSA000002076121"
# $ myColName4: chr  "HOCAN000005311227" "HOUSA000050206641" "HOUSA000014569699"
# - attr(*, "spec")=List of 2
# ..$ cols   :List of 4
# .. ..$ myColName1: list()
# .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
# .. ..$ myColName2: list()
# .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
# .. ..$ myColName3: list()
# .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
# .. ..$ myColName4: list()
# .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
# ..$ default: list()
# .. ..- attr(*, "class")= chr  "collector_guess" "collector"
# ..- attr(*, "class")= chr "col_spec"
x2
# # A tibble: 3 × 4
#   myColName1        myColName2        myColName3        myColName4
#        <chr>             <chr>             <chr>             <chr>
# 1         0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227
# 2         0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641
# 3         0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699

These solutions should work, even if IDs do not start with letter H and IDs can contain more than one H.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You can use a gsub with a fixed string replacement:

x <- c("0MHOCAN000006026421HOCAN000000392457HOCAN000005311227",
"0FHOUSA000002272874HOUSA000002272874HOUSA000050206641",
"0MHOUSA000002272874HOUSA000002076121HOUSA000014569699")
gsub("H", " H", x, fixed=TRUE)

See the R demo

Output:

[1] "0M HOCAN000006026421 HOCAN000000392457 HOCAN000005311227"
[2] "0F HOUSA000002272874 HOUSA000002272874 HOUSA000050206641"
[3] "0M HOUSA000002272874 HOUSA000002076121 HOUSA000014569699"

If your data frame df column name is col1, you can use

df$col1 = gsub("H", " H", df$col1, fixed=TRUE)

Upvotes: 5

Related Questions