Reputation: 24535
I have following dataframe:
olddf <- structure(list(test = structure(1:6, .Label = c("test1", "test2",
"test3", "test4", "test5", "test6"), class = "factor"), month0_gp1 = c("163±28",
"133±20", "177±29", "153±30", "161±31", "159±23"), month0_gp2 = c("122±17",
"167±20", "146±26", "150±27", "148±33", "161±37"), month1_gp1 = c("157±32",
"152±37", "151±24", "143±25", "144±29", "126±30"), month1_gp2 = c("181±14",
"133±34", "152±38", "144±30", "148±20", "137±19"), month3_gp1 = c("139±38",
"161±39", "166±38", "162±39", "151±38", "155±38"), month3_gp2 = c("151±40",
"161±33", "137±25", "161±31", "168±30", "147±34")), .Names = c("test",
"month0_gp1", "month0_gp2", "month1_gp1", "month1_gp2", "month3_gp1",
"month3_gp2"), row.names = c(NA, 6L), class = "data.frame")
test month0_gp1 month0_gp2 month1_gp1 month1_gp2 month3_gp1 month3_gp2
1 test1 163±28 122±17 157±32 181±14 139±38 151±40
2 test2 133±20 167±20 152±37 133±34 161±39 161±33
3 test3 177±29 146±26 151±24 152±38 166±38 137±25
4 test4 153±30 150±27 143±25 144±30 162±39 161±31
5 test5 161±31 148±33 144±29 148±20 151±38 168±30
6 test6 159±23 161±37 126±30 137±19 155±38 147±34
I have to split columns 2:7 into 2 each (one for mean and other for sd):
test month0_gp1_mean month0_gp1_sd month0_gp2_mean month0_gp2_sd month1_gp1_mean month1_gp1_sd ....
I checked earlier posts and used do.call(rbind...
method:
mydf <- data.frame(do.call(rbind, strsplit(olddf$month0_gp1,'±')))
mydf
X1 X2
1 163 28
2 133 20
3 177 29
4 153 30
5 161 31
6 159 23
But this works for one column at a time. How can I modify this to loop for 2:7 columns, and combine them to form one new dataframe? Thanks for your help.
Upvotes: 3
Views: 316
Reputation: 193517
First, get my cSplit
function from this GitHub Gist.
Second, split it up:
cSplit(olddf, 2:ncol(olddf), sep = "±")
# test 2_1 2_2 3_1 3_2 4_1 4_2 5_1 5_2 6_1 6_2 7_1 7_2
# 1: test1 163 28 122 17 157 32 181 14 139 38 151 40
# 2: test2 133 20 167 20 152 37 133 34 161 39 161 33
# 3: test3 177 29 146 26 151 24 152 38 166 38 137 25
# 4: test4 153 30 150 27 143 25 144 30 162 39 161 31
# 5: test5 161 31 148 33 144 29 148 20 151 38 168 30
# 6: test6 159 23 161 37 126 30 137 19 155 38 147 34
If you want to do the column renaming in the same step, try:
Nam <- names(olddf)[2:ncol(olddf)]
setnames(
cSplit(olddf, 2:ncol(olddf), sep = "±"),
c("test", paste(rep(Nam, each = 2), c("mean", "sd"), sep = "_")))[]
Another option would be to look at dplyr
+ tidyr
.
Here's the best I could come up with, but I'm not sure if this is the correct way to do this with these tools....
olddf %>%
gather(GM, value, -test) %>% # Makes the data somewhat long
separate(value, c("MEAN", "SD")) %>% # Splits "value" column. We're wide again
gather(MSD, value, -test, -GM) %>% # Makes the data long again
unite(var, GM, MSD) %>% # Combines GM and MSD columns
spread(var, value) # Goes from wide to long
This is sort of the equivalent of melt
ing the data once, using colsplit
on the resulting "value" column, melt
ing the data again, and using dcast
to get the wide format.
Upvotes: 6
Reputation: 109874
Here's a qdap approach:
library(qdap)
for(i in seq(2, 13, by = 2)){
olddf <- colsplit2df(olddf, i,
paste0(names(olddf)[i], "_", c("mean", "sd")), sep = "±")
}
olddf[,-1] <- lapply(olddf[,-1], as.numeric)
olddf
I looked at Ananda's splitstackshape
package first as I figured there was an easy way to do this but I couldn't figure out a way.
Not sure if you need the last line converting the columns to numeric but assumed you would.
Upvotes: 2