Steve Rowe
Steve Rowe

Reputation: 19413

How do I use dplyr to generate a new column based on rowwise data?

I want to add a new column to a data frame which is based on a row-wise calculation. Suppose I have a data frame such as this one:

x <-as.data.frame(matrix(1:10, 5, 2))

  V1 V2
1  1  6
2  2  7
3  3  8
4  4  9
5  5 10

If I want to do some rowwise operation to generate a new column, I can use rowwise() and do() to accomplish that. For example:

y <- rowwise(x) %>% do (foo = .$V1 * .$V2)

I can even append this to the existing data frame as such:

y <- rowwise(x) %>% bind_cols(do (., foo = .$V1 * .$V2))

This all works, but the result isn't quite what I want. The values in y$foo are lists, not numeric.

  V1 V2 foo
1  1  6   6
2  2  7  14
3  3  8  24
4  4  9  36
5  5 10  50

Looks right, but it isn't.

class(y$foo)
[1] "list"

So, two questions:

  1. Is there a way to make the results numeric instead of lists?
  2. Is there a better way I should be approaching this?

Update:
This is closer to what I am trying to do. Given this function:

pts <- 11:20
z <- function(x1, x2) {
  min(x1*x2*pts)
}

This doesn't produce what I expect:

y <- x %>% mutate(foo = z(V1, V2))
  V1 V2 foo
1  1  6  66
2  2  7  66
3  3  8  66
4  4  9  66
5  5 10  66

while this does:

y <-rowwise(x) %>% bind_cols( do (., data.frame(foo = z(.$V1, .$V2))))
  V1 V2 foo
1  1  6  66
2  2  7 154
3  3  8 264
4  4  9 396
5  5 10 550

Why? Is there a better way?

Upvotes: 4

Views: 1422

Answers (3)

David Arenburg
David Arenburg

Reputation: 92292

I generally don't believe in row wise operations in a vectorized language such as R. In your case you could solve the question with a simple matrix multiplications.

You could define z as follows

z <- function(x1, x2) {
  do.call(pmin, as.data.frame(tcrossprod(x1 * x2, pts)))
}

Than a simple mutate will do

x %>% mutate(foo = z(V1, V2))
#   V1 V2 foo
# 1  1  6  66
# 2  2  7 154
# 3  3  8 264
# 4  4  9 396
# 5  5 10 550

You could also enhance performance using the matrixStats::rowMins function (which is fully vectorized)

library(matrixStats)

z <- function(x1, x2) {
  rowMins(tcrossprod(x1 * x2, pts))
}

x %>% mutate(foo = z(V1, V2))
#   V1 V2 foo
# 1  1  6  66
# 2  2  7 154
# 3  3  8 264
# 4  4  9 396
# 5  5 10 550

Upvotes: 6

shadow
shadow

Reputation: 22293

You should just return a data.frame in your do statement:

y <- rowwise(x) %>% bind_cols(do(., data.frame(foo = .$V1 * .$V2)))
y
##   V1 V2 foo
## 1  1  6   6
## 2  2  7  14
## 3  3  8  24
## 4  4  9  36
## 5  5 10  50
y$foo
## [1]  6 14 24 36 50

In your updated question, you are missing the rowwise in the chain with the mutate statement, but have the rowwise in the chain with the do statement. Just add rowwise and you will get the same result.

x %>% rowwise %>% mutate(foo = z(V1, V2))
## Source: local data frame [5 x 3]
## Groups: <by row>
## 
##   V1 V2 foo
## 1  1  6  66
## 2  2  7 154
## 3  3  8 264
## 4  4  9 396
## 5  5 10 550

Upvotes: 4

Nader Hisham
Nader Hisham

Reputation: 5414

x <-as.data.frame(matrix(1:10, 5, 2))

foo <- apply(x , 1 , function(x){
  prod(x)
})

#[1]  6 14 24 36 50

class(foo)

#[1] "numeric"

df_final <- cbind(x , foo)

Upvotes: 1

Related Questions