tidyr or dplyr equivalent of JMP split table

Question

JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml

Here is the image for it:

The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.

I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.

Links that didn't help me include:

I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.

An inelegant, but repeatable, example is splitting this cars table

according to this:

Yields this:

How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?

ARobertson · Accepted Answer

Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')

require(reshape)
output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')

Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

jmpsplitcol <- function(data,splitby,splitcols,group){

  require(reshape)
  require(tidyr)

  aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
  aggsplitlist <- lapply(aggsplitlist,`[`)

  agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')

  newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)

  castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
  output <- cast(data = newgat,formula = castformula,value = 'myval')
  output
}
res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
head(res2)

tidyr or dplyr equivalent of JMP split table

Answers (2)

Related Questions