William88
William88

Reputation: 39

How to create new columns by extracting numbers from a vector column with regex?

I need to extract numbers from a vector column using regex and create columns for each number. The length of the vector will not be the same for each row.

For other purposes such as counting the number of elements within the vector I have used

str_count(DATA$vectorCOL, '[0-9.+]+')

This is the data column

vectorCOL
63.
11., 36., 45+1., 79., 90+1.

45., 80., 87.

Expected output

vectorCOL                    col1  col2 col3  col4 col5
63.                          63    NA   NA    NA   NA
11., 36., 45+1., 79., 90+1.  11    36   45+1  79   90+1
                             NA    NA   NA    NA   NA
45., 80., 87.                45    80   87    NA   NA

Upvotes: 0

Views: 40

Answers (2)

s_baldur
s_baldur

Reputation: 33488

Using data.table:

df <- df[, c(vectorCOL = list(vectorCOL), tstrsplit(vectorCOL, ","))]
setnames(df, names(df), sub("V", "col", names(df)))
df
#                      vectorCOL col2 col3   col4 col5   col6
# 1:                         63.  63. <NA>   <NA> <NA>   <NA>
# 2: 11., 36., 45+1., 79., 90+1.  11.  36.  45+1.  79.  90+1.
# 3:                             <NA> <NA>   <NA> <NA>   <NA>
# 4:               45., 80., 87.  45.  80.    87. <NA>   <NA>

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388907

We can use cSplit

splitstackshape::cSplit(df, "vectorCOL", sep = ",", drop = FALSE)
#                vectorCOL vectorCOL_1 vectorCOL_2 vectorCOL_3 vectorCOL_4 vectorCOL_5
#1:                     63.          63          NA        <NA>          NA        <NA>
#2: 11.,36.,45+1.,79.,90+1.          11          36       45+1.          79       90+1.
#3:                                  NA          NA        <NA>          NA        <NA>
#4:             45.,80.,87.          45          80         87.          NA        <NA>

If we don't want "." in the output, we can remove them first using gsub.

df$vectorCOL <- gsub("\\.", "",df$vectorCOL)

data

df <-  structure(list(vectorCOL = c("63.", "11., 36., 45+1., 79., 90+1.", 
"", "45., 80., 87.")), row.names = c(NA, -4L), class = "data.frame")

Upvotes: 1

Related Questions