Darth Ratus
Darth Ratus

Reputation: 49

Adding a new column in a data frame using a formula that combines entries from other columns

I have a data frame "Values" with entries as follows:

     Test1 Test2 Test3 


Power1 20 30 40

Power2 10 15 13

Max_Power1 30 50 90

Max_Power2 50 80 40

Voltage1 1.2 1.2 1.2

Voltage2 1.3 1.3 1.3

Current1 1.0 1.2 1.3

Current2 2.0 2.1 2.2

I need to create a new column that will contain the following:

Short of creating this as a separate data frame with one column, setting each row entry individually and then importing into original data frame I do not know of a way to do this with a single commmand.

Suggestions are appreciated.

Upvotes: 0

Views: 64

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269461

Using test given in the Note at the end create a vector of function names, fun, and mapply the each function to the the corresponding row:

avg <- mean
fun <- c("avg", "avg", "max", "max", "avg", "avg", "min", "min")
n <- nrow(test)
rows <- split(as.matrix(test), 1:n)
transform(test, Value = mapply(function(fun, x) match.fun(fun)(x), fun, rows), Stat = fun)

giving:

           Test1 Test2 Test3    Value Stat
Power1      20.0  30.0  40.0 30.00000  avg
Power2      10.0  15.0  13.0 12.66667  avg
Max_Power1  30.0  50.0  90.0 90.00000  max
Max_Power2  50.0  80.0  40.0 80.00000  max
Voltage1     1.2   1.2   1.2  1.20000  avg
Voltage2     1.3   1.3   1.3  1.30000  avg
Current1     1.0   1.2   1.3  1.00000  min
Current2     2.0   2.1   2.2  2.00000  min

Perhaps another possibility is to simply calculate the min, mean and max of every row.

stats <- function(x) c(Min = min(x), Avg = mean(x), Max = max(x))
cbind(test, t(apply(test, 1, stats)))

giving:

           Test1 Test2 Test3  Min       Avg  Max
Power1      20.0  30.0  40.0 20.0 30.000000 40.0
Power2      10.0  15.0  13.0 10.0 12.666667 15.0
Max_Power1  30.0  50.0  90.0 30.0 56.666667 90.0
Max_Power2  50.0  80.0  40.0 40.0 56.666667 80.0
Voltage1     1.2   1.2   1.2  1.2  1.200000  1.2
Voltage2     1.3   1.3   1.3  1.3  1.300000  1.3
Current1     1.0   1.2   1.3  1.0  1.166667  1.3
Current2     2.0   2.1   2.2  2.0  2.100000  2.2

Note

We assume that test is:

test <- structure(list(Test1 = c(20, 10, 30, 50, 1.2, 1.3, 1, 2), 
Test2 = c(30, 15, 50, 80, 1.2, 1.3, 1.2, 2.1), Test3 = c(40, 13, 90, 40, 1.2, 
1.3, 1.3, 2.2)), .Names = c("Test1", "Test2", "Test3"), 
class = "data.frame", row.names = c("Power1", 
"Power2", "Max_Power1", "Max_Power2", "Voltage1", "Voltage2", "Current1", "Current2"))

Upvotes: 3

Len Greski
Len Greski

Reputation: 10855

The anomaly in this question is that we need to calculate some statistics that aren't typically calculated on rows. There are a number of ways to solve this problem.

First, we could use the rpgm package and its rowMins() and rowMaxs() functions to augment base:rowMeans(). Note that we want to only calculate on columns 2 - 4, because the Measurement column is not numeric.

rawData <- "Measurement  Test1 Test2 Test3 
Power1 20 30 40
Power2 10 15 13
Max_Power1 30 50 90
Max_Power2 50 80 40
Voltage1 1.2 1.2 1.2
Voltage2 1.3 1.3 1.3
Current1 1.0 1.2 1.3
Current2 2.0 2.1 2.2"

data <- read.table(text=rawData,header=TRUE,stringsAsFactors=FALSE)
# use rpgm package
library(rpgm)
rowMaxs(data[3:4,2:4])
rowMeans(data[c(1:2,5:6),2:4])
rowMins(data[7:8,2:4])

...and the output:

> library(rpgm)
> rowMaxs(data[3:4,2:4])
[1] 90 80
> rowMeans(data[c(1:2,5:6),2:4])
       1        2        5        6 
30.00000 12.66667  1.20000  1.30000 
> rowMins(data[7:8,2:4])
[1] 1 2

Another alternative is to reshape the data using the reshape2 package so we can calculate the statistics on columns rather than rows.

library(reshape2)
data <- read.table(text=rawData,header=TRUE,stringsAsFactors=FALSE)
data2 <- melt(data,id=c("Measurement"))
data3 <- dcast(data2,variable ~ Measurement)
colMeans(data3[,6:9])
colMins(data3[,2:3])
colMaxs(data3[,4:5])

...and the output:

> library(reshape2)
> data <- read.table(text=rawData,header=TRUE,stringsAsFactors=FALSE)
> data2 <- melt(data,id=c("Measurement"))
> data3 <- dcast(data2,variable ~ Measurement)
> colMeans(data3[,6:9])
  Power1   Power2 Voltage1 Voltage2 
30.00000 12.66667  1.20000  1.30000 
> colMins(data3[,2:3])
[1] 1 2
> colMaxs(data3[,4:5])
[1] 90 80

Upvotes: 1

Related Questions