user13097399
user13097399

Reputation: 17

How to turn convert one character column into multiple numerical columns in R

I want to a character column into multiple numerical columns.

For example, I have these columns in table "final":

final

sd2 file
0.011 r100.0.8.1.1.csv
0.023 r10.0.2.0.1.csv
0.033 r10.0.95.0.0.csv

I want to turn it into something like below. The "file" column is separated into 3 separate columns where there is no longer an "r" or ".csv" and the first two numbers (ie 100 and 0.8) are in different columns but the last one (ie 1.1) remain in one column- although this could be further separated into two columns if easier (ie the 1 and 1 in the "dom" column could be in separate columns).

sd2 pop sel dom
0.011 100 0.8 1.1
0.023 10 0.2 0.1
0.033 10 0.95 0.0

I've used strsplit to make the "file" column a list of character strings.

files <- final$file
files <- as.character(files)

per <- strsplit(files, "[.]")

I'm not sure how to turn these strings into separated columns, while forgoing the "r" and ".col" Any suggestions on next steps would be appreciated!

Upvotes: 1

Views: 196

Answers (2)

jay.sf
jay.sf

Reputation: 72813

An option is to strsplit at letters or point, make a matrix out of non-empty elements whose interrelated columns we paste together with ".".

u <- unlist(strsplit(d$file, "[a-z]|\\."))
u <- matrix(u[nchar(u) > 0],,5,b=T)
res <- data.frame(sd2=d$sd2, sapply(list(pop=1, sel=2:3, dom=4:5), function(x) 
  as.double(apply(u[, x, drop=F], 1, paste, collapse="."))))
res
#     sd2 pop  sel dom
# 1 0.011 100 0.80 1.1
# 2 0.023  10 0.20 0.1
# 3 0.033  10 0.95 0.0

Data:

d <- structure(list(sd2 = c(0.011, 0.023, 0.033), file = c("r100.0.8.1.1.csv", 
"r10.0.2.0.1.csv", "r10.0.95.0.0.csv")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

akrun
akrun

Reputation: 887118

An option with read.csv from base R. Here, we can capture the substring as a group, modify the delimiter with ,, and use read.csv to read the column as a data.frame and assign new columns to 'final' dataset in base R

final[c("pop", "sel", "dom")] <-  read.csv(text = 
  sub("^[a-z](\\d+)\\.(\\d+\\.\\d+)\\.([0-9.]+)\\.csv",
       "\\1,\\2,\\3", final$file), header = FALSE)

The same option can be used in extract from tidyr

library(tidyr)
final %>%
    extract(file, into = c("pop", "sel", "dom"),
         "^[a-z](\\d+)\\.(\\d+\\.\\d+)\\.([0-9.]+)\\.csv", convert = TRUE)

-output

#    sd2 pop  sel dom
#1 0.011 100 0.80 1.1
#2 0.023  10 0.20 0.1
#3 0.033  10 0.95 0.0

data

final <- structure(list(sd2 = c(0.011, 0.023, 0.033), file = c("r100.0.8.1.1.csv", 
"r10.0.2.0.1.csv", "r10.0.95.0.0.csv")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Related Questions