rnso
rnso

Reputation: 24535

Splitting multiple columns in R

I have following dataframe:

olddf <- structure(list(test = structure(1:6, .Label = c("test1", "test2", 
"test3", "test4", "test5", "test6"), class = "factor"), month0_gp1 = c("163±28", 
"133±20", "177±29", "153±30", "161±31", "159±23"), month0_gp2 = c("122±17", 
"167±20", "146±26", "150±27", "148±33", "161±37"), month1_gp1 = c("157±32", 
"152±37", "151±24", "143±25", "144±29", "126±30"), month1_gp2 = c("181±14", 
"133±34", "152±38", "144±30", "148±20", "137±19"), month3_gp1 = c("139±38", 
"161±39", "166±38", "162±39", "151±38", "155±38"), month3_gp2 = c("151±40", 
"161±33", "137±25", "161±31", "168±30", "147±34")), .Names = c("test", 
"month0_gp1", "month0_gp2", "month1_gp1", "month1_gp2", "month3_gp1", 
"month3_gp2"), row.names = c(NA, 6L), class = "data.frame")

   test month0_gp1 month0_gp2 month1_gp1 month1_gp2 month3_gp1 month3_gp2
1 test1     163±28     122±17     157±32     181±14     139±38     151±40
2 test2     133±20     167±20     152±37     133±34     161±39     161±33
3 test3     177±29     146±26     151±24     152±38     166±38     137±25
4 test4     153±30     150±27     143±25     144±30     162±39     161±31
5 test5     161±31     148±33     144±29     148±20     151±38     168±30
6 test6     159±23     161±37     126±30     137±19     155±38     147±34

I have to split columns 2:7 into 2 each (one for mean and other for sd):

test month0_gp1_mean month0_gp1_sd month0_gp2_mean month0_gp2_sd month1_gp1_mean month1_gp1_sd  ....

I checked earlier posts and used do.call(rbind... method:

mydf <- data.frame(do.call(rbind, strsplit(olddf$month0_gp1,'±')))

mydf
   X1 X2
1 163 28
2 133 20
3 177 29
4 153 30
5 161 31
6 159 23

But this works for one column at a time. How can I modify this to loop for 2:7 columns, and combine them to form one new dataframe? Thanks for your help.

Upvotes: 3

Views: 316

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

First, get my cSplit function from this GitHub Gist.

Second, split it up:

cSplit(olddf, 2:ncol(olddf), sep = "±")
#     test 2_1 2_2 3_1 3_2 4_1 4_2 5_1 5_2 6_1 6_2 7_1 7_2
# 1: test1 163  28 122  17 157  32 181  14 139  38 151  40
# 2: test2 133  20 167  20 152  37 133  34 161  39 161  33
# 3: test3 177  29 146  26 151  24 152  38 166  38 137  25
# 4: test4 153  30 150  27 143  25 144  30 162  39 161  31
# 5: test5 161  31 148  33 144  29 148  20 151  38 168  30
# 6: test6 159  23 161  37 126  30 137  19 155  38 147  34

If you want to do the column renaming in the same step, try:

Nam <- names(olddf)[2:ncol(olddf)]
setnames(
  cSplit(olddf, 2:ncol(olddf), sep = "±"), 
  c("test", paste(rep(Nam, each = 2), c("mean", "sd"), sep = "_")))[]

Another option would be to look at dplyr + tidyr.

Here's the best I could come up with, but I'm not sure if this is the correct way to do this with these tools....

olddf %>%
  gather(GM, value, -test) %>%          # Makes the data somewhat long
  separate(value, c("MEAN", "SD")) %>%  # Splits "value" column. We're wide again
  gather(MSD, value, -test, -GM) %>%    # Makes the data long again
  unite(var, GM, MSD) %>%               # Combines GM and MSD columns
  spread(var, value)                    # Goes from wide to long

This is sort of the equivalent of melting the data once, using colsplit on the resulting "value" column, melting the data again, and using dcast to get the wide format.

Upvotes: 6

Tyler Rinker
Tyler Rinker

Reputation: 109874

Here's a qdap approach:

library(qdap)
for(i in seq(2, 13, by = 2)){
    olddf <- colsplit2df(olddf, i, 
        paste0(names(olddf)[i], "_", c("mean", "sd")), sep = "±")
}

olddf[,-1] <- lapply(olddf[,-1], as.numeric)
olddf

I looked at Ananda's splitstackshape package first as I figured there was an easy way to do this but I couldn't figure out a way.

Not sure if you need the last line converting the columns to numeric but assumed you would.

Upvotes: 2

Related Questions