How to split dataframe, create new variable in each list dataframe, and unsplit?

Question

I am trying to split a dataframe, create a new variable in each dataframe list object, and reassemble (unsplit) the original dataframe.

The new variable I am trying to create scales the variable B.2 from 0 to 1 for each factor level in the variable Type.

BWRX$B.2 <- BWRX$B #Create a new version of B
BWRX.Split <- split(BWRX, BWRX$Type) #Split by Type
BWRX.Split.BScaled <-lapply(BWRX.Split, function(df){df$B.3 <- (df$B.2-min(df$B.2))/(max(df$B.2)-min(df$B.2))}) #Scale B.2

The above code returns a list with the values of B.2 correctly scaled within each factor level. The tricky part is that I cannot figure out how to add this variable to each dataframe in BWRX.Split.

I thought df$B.3 would correct for this, but it has not. Once B.3 is a part of each dataframe can unsplit(, Type) be used to reassemble the dataframes or would do.call be better? I was trying to combine unsplit and split so everything would be in one line to code. Is there a more efficient method?

akrun · Accepted Answer

We don't really need to split it, this can be done using ave from base R. The advantage is that the new column will added in the same order as in the original row order of the dataset.

transform(BWRX, BScaled = ave(B.2, Type, 
        FUN = function(x) (x- min(x))/(max(x)- min(x))))

This is a group by operation. So, it can be efficiently done with data.table or dplyr

library(data.table)
setDT(BWRX)[, BScaled := (B.2 - min(B.2))/(max(B.2) - min(B.2)), by = Type]

How to split dataframe, create new variable in each list dataframe, and unsplit?

Answers (2)

Related Questions