Reputation: 2054
In R, I have the following dataframe:
Name Category
1 Beans 1.12.5
2 Pears 5.7.9
3 Eggs 10.6.5
What I would like to have is the following:
Name Cat1 Cat2 Cat3
1 Beans 1 12 5
2 Pears 5 7 9
3 Eggs 10 6 5
Ideally some expression built inside plyr would be nice...
I will investigate on my side but as searching this might take me a lot of time I was just wondering if some of you do have some hints to perform this...
Upvotes: 2
Views: 207
Reputation: 193517
I've written a function concat.split
(a "family" of functions, actually) as part of my splitstackshape
package for dealing with these types of problems:
# install.packages("splitstackshape")
library(splitstackshape)
concat.split(mydf, "Category", ".", drop=TRUE)
# Name Category_1 Category_2 Category_3
# 1 Beans 1 12 5
# 2 Pears 5 7 9
# 3 Eggs 10 6 5
It also works nicely on "unbalanced" data.
dat <- data.frame(Name = c("Beans", "Pears", "Eggs"),
Category = c("1.12.5", "5.7.9.8", "10.6.5.7.7"))
concat.split(dat, "Category", ".", drop = TRUE)
# Name Category_1 Category_2 Category_3 Category_4 Category_5
# 1 Beans 1 12 5 NA NA
# 2 Pears 5 7 9 8 NA
# 3 Eggs 10 6 5 7 7
Because "long" or "molten" data are often required in these types of situations, the concat.split.multiple
function has a "long" argument too:
concat.split.multiple(dat, "Category", ".", direction = "long")
# Name time Category
# 1 Beans 1 1
# 2 Pears 1 5
# 3 Eggs 1 10
# 4 Beans 2 12
# 5 Pears 2 7
# 6 Eggs 2 6
# 7 Beans 3 5
# 8 Pears 3 9
# 9 Eggs 3 5
# 10 Beans 4 NA
# 11 Pears 4 8
# 12 Eggs 4 7
# 13 Beans 5 NA
# 14 Pears 5 NA
# 15 Eggs 5 7
Upvotes: 7
Reputation: 93813
If you have a consistent number of categories, then this will work:
#recreate your data first:
dat <- data.frame(Name = c("Beans", "Pears", "Eggs"), Category = c("1.12.5",
"5.7.9", "10.6.5"),stringsAsFactors=FALSE)
spl <- strsplit(dat$Category,"\\.")
len <- sapply(spl,length)
dat[paste0("cat",1:max(len))] <- t(sapply(spl,as.numeric))
Result:
dat
Name Category cat1 cat2 cat3
1 Beans 1.12.5 1 12 5
2 Pears 5.7.9 5 7 9
3 Eggs 10.6.5 10 6 5
If you have differing numbers of separated values, then this should account for it:
#example unbalanced data
dat <- data.frame(Name = c("Beans", "Pears", "Eggs"), Category = c("1.12.5",
"5.7.9", "10.6.5"),stringsAsFactors=FALSE)
dat$Category[2] <- "5.7"
spl <- strsplit(dat$Category,"\\.")
len <- sapply(spl,length)
spl <- Map(function(x,y) c(x,rep(NA,max(len)-y)), spl, len)
dat[paste0("cat",1:max(len))] <- t(sapply(spl,as.numeric))
Result:
Name Category cat1 cat2 cat3
1 Beans 1.12.5 1 12 5
2 Pears 5.7 5 7 NA
3 Eggs 10.6.5 10 6 5
Upvotes: 2
Reputation: 109874
The qdap package has the colsplit2df
for just these sort of situations:
#recreate your data first:
dat <- data.frame(Name = c("Beans", "Pears", "Eggs"), Category = c("1.12.5",
"5.7.9", "10.6.5"),stringsAsFactors=FALSE)
library(qdap)
colsplit2df(dat, 2, paste0("cat", 1:3))
## > colsplit2df(dat, 2, paste0("cat", 1:3))
## Name cat1 cat2 cat3
## 1 Beans 1 12 5
## 2 Pears 5 7 9
## 3 Eggs 10 6 5
Upvotes: 2