EngrStudent
EngrStudent

Reputation: 2022

tidyr or dplyr equivalent of JMP split table

JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml

Here is the image for it:

enter image description here

The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.

I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.

Links that didn't help me include:

I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.

An inelegant, but repeatable, example is splitting this cars table
enter image description here

according to this:
enter image description here

Yields this:
enter image description here

How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?

Upvotes: 2

Views: 553

Answers (2)

Robbie
Robbie

Reputation: 11

I had this same issue and the suggested solution was close but didn't quite work for my data. I struggle to keep the JMP inputs straight, so I renamed them to make more sense:

  • data = data.frame

  • key_col = "Split By"

  • value_col = "Split Columns"

  • gather_cols = "Group"

     jmpsplitcol <- function(data, key_col, value_col, gather_cols) {
          require(tidyr)
          require(dplyr)
    
          # Group by the gather columns and the key column
          data_grouped <- data %>%
           group_by(across(all_of(gather_cols)), !!sym(key_col)) %>%
           summarize(mean_value = mean(!!sym(value_col), na.rm = TRUE), .groups = 'drop')
    
         # Spread the key column to create new columns
         data_wide <- data_grouped %>%
           spread(key = !!sym(key_col), value = mean_value)
    
         return(data_wide)
     }
    

I was getting a row for each unique value of "key_col", so I used "mean" to hack them back together since the mean of 'x', NA, NA, NA, NA, NA with na.rm = TRUE is just 'x'.

Upvotes: 1

ARobertson
ARobertson

Reputation: 2897

Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')

require(reshape)
output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')

Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

jmpsplitcol <- function(data,splitby,splitcols,group){

  require(reshape)
  require(tidyr)

  aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
  aggsplitlist <- lapply(aggsplitlist,`[`)

  agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')

  newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)

  castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
  output <- cast(data = newgat,formula = castformula,value = 'myval')
  output
}
res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
head(res2)

Upvotes: 1

Related Questions