Split multiple columns based on condition and rename

Question

I have a large dataset (1Gb) that looks like this:

Column1  Column2  Column3
ID1      1:2=2.3  2:3=7
ID2      1:2=3.2  2:3=8
ID3      1:2=6.5  2:3=10

From this, I would like to create a new dataset:

Column1  1:2  2:3
ID1      2.3  7
ID2      3.2  8
ID3      6.5  0

Basically, I need to separate the dataframe based on "=", get rid of the new columns, keep only the values and re-naming the columns with 1:2, 2:3 (without having to use the colnames function as I have hundreds of columns).

I am considering creating a loop or a function, which would split the columns using str_split_fixed and merge the desired columns together into a new dataframe.

I am thinking there should be an easier way. Any thoughts would be appreciated!

Sotos · Accepted Answer

Here is one way to do it,

nms <- sapply(df[-1], function(i)unique(sub('=.*', '', i)))
df[-1] <- lapply(df[-1], function(i)sub('.*=', '', i))
names(df)[-1] <- nms
df
#  Column1 1:2 2:3
#1     ID1 2.3   7
#2     ID2 3.2   8
#3     ID3 6.5   9

Data

structure(list(Column1 = c("ID1", "ID2", "ID3"), Column2 = c("1:2=2.3", 
"1:2=3.2", "1:2=6.5"), Column3 = c("2:3=7", "2:3=8", "2:3=9")), .Names = c("Column1", 
"Column2", "Column3"), row.names = c(NA, -3L), class = "data.frame")

Split multiple columns based on condition and rename

Answers (2)

Related Questions