Reputation: 39
I need to extract numbers from a vector column using regex and create columns for each number. The length of the vector will not be the same for each row.
For other purposes such as counting the number of elements within the vector I have used
str_count(DATA$vectorCOL, '[0-9.+]+')
This is the data column
vectorCOL
63.
11., 36., 45+1., 79., 90+1.
45., 80., 87.
Expected output
vectorCOL col1 col2 col3 col4 col5
63. 63 NA NA NA NA
11., 36., 45+1., 79., 90+1. 11 36 45+1 79 90+1
NA NA NA NA NA
45., 80., 87. 45 80 87 NA NA
Upvotes: 0
Views: 40
Reputation: 33488
Using data.table
:
df <- df[, c(vectorCOL = list(vectorCOL), tstrsplit(vectorCOL, ","))]
setnames(df, names(df), sub("V", "col", names(df)))
df
# vectorCOL col2 col3 col4 col5 col6
# 1: 63. 63. <NA> <NA> <NA> <NA>
# 2: 11., 36., 45+1., 79., 90+1. 11. 36. 45+1. 79. 90+1.
# 3: <NA> <NA> <NA> <NA> <NA>
# 4: 45., 80., 87. 45. 80. 87. <NA> <NA>
Upvotes: 0
Reputation: 388907
We can use cSplit
splitstackshape::cSplit(df, "vectorCOL", sep = ",", drop = FALSE)
# vectorCOL vectorCOL_1 vectorCOL_2 vectorCOL_3 vectorCOL_4 vectorCOL_5
#1: 63. 63 NA <NA> NA <NA>
#2: 11.,36.,45+1.,79.,90+1. 11 36 45+1. 79 90+1.
#3: NA NA <NA> NA <NA>
#4: 45.,80.,87. 45 80 87. NA <NA>
If we don't want "." in the output, we can remove them first using gsub
.
df$vectorCOL <- gsub("\\.", "",df$vectorCOL)
data
df <- structure(list(vectorCOL = c("63.", "11., 36., 45+1., 79., 90+1.",
"", "45., 80., 87.")), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 1