Reputation: 10626
I am trying to use rowSds()
to calculate each rows standard deviation so that I can pick the rows that have high sds to graph.
My data frame is called xx
is like this:
head(xx,1)
Job variable 2012-02-23 2012-02-24 2012-02-25 2012-02-27 2012-02-28 2012-02-29 2012-03-01 2012-03-02 2012-03-03 2012-03-05 2012-03-06 2012-03-07 2012-03-08 2012-03-09 2012-03-10 2012-03-12 2012-03-13 2012-03-14
1 A Duration 152 424 NA 499 320 117 211 363 NA 605 76 309 204 185 NA 25 733 500
2012-03-15 2012-03-16 2012-03-17 2012-03-19 2012-03-20 2012-03-21 2012-03-22 2012-03-23 2012-03-24 2012-03-26 2012-03-27 2012-03-28 2012-03-29 2012-03-30 2012-03-31 2012-04-02 2012-04-03 2012-04-04 2012-04-05 2012-04-06
1 521 601 NA 229 758 421 334 659 NA 419 423 444 289 594 NA 327 533 183 211 235
2012-04-07 2012-04-09 2012-04-10 2012-04-11 2012-04-12 2012-04-13 2012-04-14 2012-04-16 2012-04-17 2012-04-18 2012-04-19 2012-04-20 2012-04-21 2012-04-23 2012-04-24 2012-04-25 2012-04-26 2012-04-27 2012-04-28 2012-04-30
1 NA 225 419 236 218 188 NA 205 547 153 196 200 NA 259 257 208 302 244 NA 806
2012-05-01 2012-05-02 2012-05-03 2012-05-04 2012-05-05 2012-05-07 2012-05-08 2012-05-09 2012-05-10 2012-05-11 2012-05-12 2012-05-14 2012-05-15 2012-05-16 2012-05-17 2012-05-18 2012-05-19 2012-05-21 2012-05-22 2012-05-23
1 402 492 1078 440 NA 382 576 1105 511 368 NA 360 381 1152 718 353 NA 408 413 935
2012-05-24 2012-05-25 2012-05-26 2012-05-28 2012-05-29 2012-05-30 2012-05-31 2012-06-01 2012-06-02 2012-06-04 2012-06-05 2012-06-06 2012-06-07 2012-06-08 2012-06-09 2012-06-11 2012-06-12 2012-06-13 2012-06-14 2012-06-15
1 306 277 NA 253 367 977 557 432 NA 328 521 467 972 1556 NA 386 1394 401 857 857
2012-06-16 2012-06-18 2012-06-19 2012-06-20 2012-06-21 2012-06-22 2012-06-23 2012-06-25 2012-06-26 2012-06-27 2012-06-28 2012-06-29 2012-06-30 2012-07-02 2012-07-03 2012-07-04 2012-07-05 2012-07-06 2012-07-07 2012-07-09
1 NA 1056 324 329 327 325 NA 341 268 231 245 301 NA 283 365 297 310 260 NA 254
2012-07-10 2012-07-11 2012-07-12 2012-07-13 2012-07-14 2012-07-16 2012-07-17 2012-07-18 2012-07-19 2012-07-20 2012-07-21 2012-07-23 2012-07-24 2012-07-25 2012-07-26 2012-07-27 2012-07-28 2012-07-30 2012-07-31 2012-08-01
1 283 395 273 273 NA 278 243 210 356 267 NA 442 483 271 327 271 NA 716 598 577
2012-08-02 2012-08-03 2012-08-06 2012-08-07 2012-08-08 2012-08-09 2012-08-10 2012-08-13 2012-08-14 2012-08-15 2012-08-16 2012-08-17 2012-08-20 2012-08-21 2012-08-22 2012-08-23 2012-08-24 2012-08-27 2012-08-28 2012-08-29
1 345 403 318 522 333 259 404 244 240 288 245 22 738 530 390 648 294 403 381 724
2012-08-30 2012-08-31 2012-09-03 2012-09-04 2012-09-05 2012-09-06 2012-09-07 2012-09-10 2012-09-11 2012-09-12 2012-09-13 2012-09-14 2012-09-17 2012-09-18 2012-09-19 2012-09-20 2012-09-21 2012-09-24 2012-09-25 2012-09-26
1 740 575 558 785 883 501 901 500 285 174 562 1047 603 990 289 173 253 512 236 278
2012-09-27 2012-09-28 2012-10-01 2012-10-02 2012-10-03 2012-10-04 2012-10-05 2012-10-08 2012-10-09 2012-10-10 2012-10-11 1 173 277 217 291 197 308 124 387 369 250 242
I am trying to calculate each rows standard deviation and assinging to sd column name:
xx$sd<-rowSds(xx)
I get this error:
Error in apply(na.omit(as.matrix(x), ...), 1, FUN, ...) :
error in evaluating the argument 'X' in selecting a method for function 'apply': Error in na.omit(as.matrix(x), ...) :
error in evaluating the argument 'object' in selecting a method for function 'na.omit': Error in `colnames<-`(`*tmp*`, value = c("2012-02-23", "2012-02-24", "2012-02-25", :
length of 'dimnames' [2] not equal to array extent
Any ideas how can I omit NA
when calculating the SD? Is my syntax correct?
Upvotes: 15
Views: 79797
Reputation: 2709
Also works, based on this answer
set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
vars_to_sum = grep("X", names(X), value=T)
X %>%
group_by(row_number()) %>%
do(data.frame(.,
SD = sd(unlist(.[vars_to_sum]), na.rm=T)))
...which appends a couple of row number columns, so probably better to explicitly add your row IDs for grouping.
X %>%
mutate(ID = row_number()) %>%
group_by(ID) %>%
do(data.frame(., SD = sd(unlist(.[vars_to_sum]), na.rm=T)))
This syntax also has the feature of being able to specify which columns you want to use.
Upvotes: 1
Reputation: 61154
You can use apply
and transform
functions
set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 SD
1 NA 12 17 18 19 16 12 13 20 14 3.041381
2 14 12 13 13 14 18 16 17 20 10 3.020302
3 11 19 NA 12 19 19 19 20 12 20 3.865805
4 10 11 20 12 15 17 18 17 18 12 3.496029
5 12 15 NA 14 20 18 16 11 14 18 2.958040
6 19 11 10 20 13 14 17 16 10 16 3.596294
7 14 16 17 15 10 11 15 15 11 16 2.449490
8 NA 10 15 19 19 12 15 15 19 14 3.201562
9 11 NA NA 20 20 14 14 17 14 19 3.356763
10 15 13 14 15 NA 13 15 NA 15 12 1.195229
From ?apply
you can see ...
which allows using optional arguments to FUN, in this case you can use na.rm=TRUE
to omit NA
values.
Using rowSds
from matrixStats package also requires setting na.rm=TRUE
to omit NA
library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.
Upvotes: 36