Reputation: 545588
I have data of the following format:
gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
I would now like to attach, to each row, the total sum of all the elements in the row (my actual function is more complex but sum
illustrates the problem).
Without dplyr, I’d write
cbind(x, Sum = apply(x, 1, sum))
Resulting in:
A C G T Sum
1 3 1 6 9 19
2 3 4 3 3 13
3 3 1 10 5 19
4 7 2 1 6 16
…
But it seems surprisingly hard to do this with dplyr.
I’ve tried
x %>% rowwise() %>% mutate(Sum = sum(A : T))
But the result is not the sum of the columns of each row, it’s something unexpected and (to me) inexplicable.
I’ve also tried
x %>% rowwise() %>% mutate(Sum = sum(.))
But here, .
is simply a placeholder for the whole x
. Providing no argument does, unsurprisingly, also not work (results are all 0
). Needless to say, none of these variants works without rowwise()
, either.
(There isn’t really any reason to necessarily do this in dplyr, but (a) I’d like to keep my code as uniform as possible, and jumping between different APIs doesn’t help; and (b) I’m hoping to one day get automatic and free parallelisation of such commands in dplyr.)
Upvotes: 10
Views: 2246
Reputation: 67778
I once did something similar, and by that time I ended up with:
x %>%
rowwise() %>%
do(data.frame(., res = sum(unlist(.))))
# A C G T res
# 1 3 2 8 6 19
# 2 6 1 7 10 24
# 3 4 8 6 7 25
# 4 6 4 7 8 25
# 5 6 10 7 2 25
# 6 7 1 2 2 12
# 7 5 4 8 5 22
# 8 9 2 3 2 16
# 9 3 4 7 6 20
# 10 7 5 3 9 24
Perhaps your more complex function works fine without unlist
, but it seems like it is necessary for sum
. Because .
refers to the "current group", I initially thought that .
for e.g. the first row in the rowwise
machinery would correspond to x[1, ]
, which is a list, which sum
swallows happily outside do
is.list((x[1, ]))
# [1] TRUE
sum(x[1, ])
# [1] 19
However, without unlist
in do
an error is generated, and I am not sure why:
x %>%
rowwise() %>%
do(data.frame(., res = sum(.)))
# Error in sum(.) : invalid 'type' (list) of argument
Upvotes: 5
Reputation: 70266
I'll try to show an example of what I wrote in my comment. Let's assume you have a custom-function f
:
f <- function(vec) sum(vec)^2
And you want to apply this function to each row of your data.frame x. One option in base R would be to use apply
, as you show in your question:
> transform(x, z = apply(x, 1, f))
# A C G T z
#1 5 7 10 7 841
#2 1 9 5 9 576
#3 7 10 2 4 529
#4 1 4 10 1 256
#5 4 4 5 2 225
#6 9 1 6 8 576
#7 9 3 7 1 400
#8 5 2 7 5 361
#9 6 3 10 4 529
#10 5 10 1 6 484
Little disadvantage here is, because you are using apply
on a data.frame, the whole data.frame is converted to matrix
first and this would mean of course that all columns are converted to the same type.
With dplyr (and tidyr) you could solve the problem with gathering/melting and spreading/casting afterwards.
library(dplyr)
library(tidyr)
x %>%
mutate(n = row_number()) %>% # add row numbers for grouping
gather(key, value, A:T) %>%
group_by(n) %>%
mutate(z = f(value)) %>%
ungroup() %>%
spread(key, value) %>%
select(-n)
#Source: local data frame [10 x 5]
#
# z A C G T
#1 841 5 7 10 7
#2 576 1 9 5 9
#3 529 7 10 2 4
#4 256 1 4 10 1
#5 225 4 4 5 2
#6 576 9 1 6 8
#7 400 9 3 7 1
#8 361 5 2 7 5
#9 529 6 3 10 4
#10 484 5 10 1 6
This is obviously quite a bit longer code than using apply
but as soon as the data get a bit larger, I expect this to be a lot faster than any apply
over the rows of a data.frame.
Alternatively, you could use rowwise
if you specify the columns manually:
x %>%
rowwise %>%
mutate(z = f(c(A,C,G,T))) # manual column specification
#Source: local data frame [10 x 5]
#Groups: <by row>
#
# A C G T z
#1 5 7 10 7 841
#2 1 9 5 9 576
#3 7 10 2 4 529
#4 1 4 10 1 256
#5 4 4 5 2 225
#6 9 1 6 8 576
#7 9 3 7 1 400
#8 5 2 7 5 361
#9 6 3 10 4 529
#10 5 10 1 6 484
I haven't figured out yet, if the rowwise
solution can be changed so that it would work with character input of the column names - perhaps with lazyeval somehow.
set.seed(16457)
gen = function () sample.int(10, replace = TRUE)
x = data.frame(A = gen(), C = gen(), G = gen(), T = gen())
Upvotes: 1
Reputation: 3488
Does this do what you'd like?
Data %>%
mutate(SumVar=rowSums(.))
Upvotes: 3