Reputation: 127
I have this piece of code:
p.data=samp_data[,c('t_het_f','t_ane_f','t_loh_f')]
str(p.data)
head(p.data)
colnames(p.data)
head(apply(p.data,1,which.max))
which for one set of data produces this result:
'data.frame': 449 obs. of 3 variables:
$ t_het_f: num 0.663 0.688 0.746 0.429 0.484 ...
$ t_ane_f: num 0.291 0.3 0.247 0.398 0.261 ...
$ t_loh_f: num 0.04601 0.01236 0.00657 0.17376 0.2546 ...
t_het_f t_ane_f t_loh_f
1 0.6629108 0.2910798 0.046009390
...
6 0.7019118 0.2589706 0.039117647
[1] "t_het_f" "t_ane_f" "t_loh_f"
[1] 1 1 1 1 1 1
But for another set of data produces:
'data.frame': 587 obs. of 3 variables:
$ t_het_f: num 0.505 0.566 0.205 0.367 0.59 ...
$ t_ane_f: num 0.491 0.182 0.745 0.42 0.251 ...
$ t_loh_f: num 0.00427 0.25193 0.05003 0.21227 0.15891 ...
t_het_f t_ane_f t_loh_f
1 0.5048134 0.4909143 0.004272287
...
6 0.8159115 0.1829711 0.001117381
[1] "t_het_f" "t_ane_f" "t_loh_f"
[[1]]
t_het_f
1
[[2]]
t_het_f
1
Why would what looks to me like the same data structure (p.data) produce a vector in one case, and a list in another?
Upvotes: 0
Views: 652
Reputation: 127
Since the same function (which.max) was applied in both cases, it was not obvious that it might be returning different length values for the two datasets. The difference was being caused by the presence of 'NA' in the second dataset, but not in the first.
Upvotes: 0
Reputation: 887118
The return
Value
in apply
depends on the length
of the output as mentioned in ?apply
If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise. If n is 0, the result has length 0 but not necessarily the ‘correct’ dimension.
If the calls to FUN return vectors of different lengths, apply returns a list of length prod(dim(X)[MARGIN]) with dim set to MARGIN if this has length greater than one.
Upvotes: 0