Reputation: 17
I want to a character column into multiple numerical columns.
For example, I have these columns in table "final":
final
sd2 | file |
---|---|
0.011 | r100.0.8.1.1.csv |
0.023 | r10.0.2.0.1.csv |
0.033 | r10.0.95.0.0.csv |
I want to turn it into something like below. The "file" column is separated into 3 separate columns where there is no longer an "r" or ".csv" and the first two numbers (ie 100 and 0.8) are in different columns but the last one (ie 1.1) remain in one column- although this could be further separated into two columns if easier (ie the 1 and 1 in the "dom" column could be in separate columns).
sd2 | pop | sel | dom |
---|---|---|---|
0.011 | 100 | 0.8 | 1.1 |
0.023 | 10 | 0.2 | 0.1 |
0.033 | 10 | 0.95 | 0.0 |
I've used strsplit to make the "file" column a list of character strings.
files <- final$file
files <- as.character(files)
per <- strsplit(files, "[.]")
I'm not sure how to turn these strings into separated columns, while forgoing the "r" and ".col" Any suggestions on next steps would be appreciated!
Upvotes: 1
Views: 196
Reputation: 72813
An option is to strsplit
at letters or point, make a matrix out of non-empty elements whose interrelated columns we paste
together with "."
.
u <- unlist(strsplit(d$file, "[a-z]|\\."))
u <- matrix(u[nchar(u) > 0],,5,b=T)
res <- data.frame(sd2=d$sd2, sapply(list(pop=1, sel=2:3, dom=4:5), function(x)
as.double(apply(u[, x, drop=F], 1, paste, collapse="."))))
res
# sd2 pop sel dom
# 1 0.011 100 0.80 1.1
# 2 0.023 10 0.20 0.1
# 3 0.033 10 0.95 0.0
Data:
d <- structure(list(sd2 = c(0.011, 0.023, 0.033), file = c("r100.0.8.1.1.csv",
"r10.0.2.0.1.csv", "r10.0.95.0.0.csv")), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Reputation: 887118
An option with read.csv
from base R
. Here, we can capture the substring as a group, modify the delimiter with ,
, and use read.csv
to read the column as a data.frame
and assign new columns to 'final' dataset in base R
final[c("pop", "sel", "dom")] <- read.csv(text =
sub("^[a-z](\\d+)\\.(\\d+\\.\\d+)\\.([0-9.]+)\\.csv",
"\\1,\\2,\\3", final$file), header = FALSE)
The same option can be used in extract
from tidyr
library(tidyr)
final %>%
extract(file, into = c("pop", "sel", "dom"),
"^[a-z](\\d+)\\.(\\d+\\.\\d+)\\.([0-9.]+)\\.csv", convert = TRUE)
-output
# sd2 pop sel dom
#1 0.011 100 0.80 1.1
#2 0.023 10 0.20 0.1
#3 0.033 10 0.95 0.0
final <- structure(list(sd2 = c(0.011, 0.023, 0.033), file = c("r100.0.8.1.1.csv",
"r10.0.2.0.1.csv", "r10.0.95.0.0.csv")), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1