Reputation: 2022
JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml
Here is the image for it:
The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.
I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.
Links that didn't help me include:
I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.
An inelegant, but repeatable, example is splitting this cars table
How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?
Upvotes: 2
Views: 553
Reputation: 11
I had this same issue and the suggested solution was close but didn't quite work for my data. I struggle to keep the JMP inputs straight, so I renamed them to make more sense:
data = data.frame
key_col = "Split By"
value_col = "Split Columns"
gather_cols = "Group"
jmpsplitcol <- function(data, key_col, value_col, gather_cols) {
require(tidyr)
require(dplyr)
# Group by the gather columns and the key column
data_grouped <- data %>%
group_by(across(all_of(gather_cols)), !!sym(key_col)) %>%
summarize(mean_value = mean(!!sym(value_col), na.rm = TRUE), .groups = 'drop')
# Spread the key column to create new columns
data_wide <- data_grouped %>%
spread(key = !!sym(key_col), value = mean_value)
return(data_wide)
}
I was getting a row for each unique value of "key_col", so I used "mean" to hack them back together since the mean of 'x', NA, NA, NA, NA, NA with na.rm = TRUE is just 'x'.
Upvotes: 1
Reputation: 2897
Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.
x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))
agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')
require(reshape)
output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')
Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.
x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))
jmpsplitcol <- function(data,splitby,splitcols,group){
require(reshape)
require(tidyr)
aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
aggsplitlist <- lapply(aggsplitlist,`[`)
agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')
newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)
castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
output <- cast(data = newgat,formula = castformula,value = 'myval')
output
}
res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
head(res2)
Upvotes: 1