Reputation: 744
Could anyone explain the split and list function in R? I am quite confused how to use them together. For example
x <- rnorm(10)
a <- gl(2,5)
b <- gl(5,2)
str(split(x,list(a,b))
The result I get is
List of 10
$ 1.1: num [1:2] 0.1326 -0.0578
$ 2.1: num(0)
$ 1.2: num [1:2] 0.151 0.907
$ 2.2: num(0)
$ 1.3: num -0.393
$ 2.3: num 1.83
$ 1.4: num(0)
$ 2.4: num [1:2] 0.4266 -0.0116
$ 1.5: num(0)
$ 2.5: num [1:2] 0.62 1.64
How are values in x
assigned to a level in list(a,b)
? Why are there some levels without any values and some with many values? I do not see any relation between the values in x
and the levels of list(a,b)
. Are they randomly assigned?
Really apreciate if someone could help me with this.
Upvotes: 0
Views: 172
Reputation: 44299
When you call split(x, list(a, b))
, you are basically saying that two x
values are in the same group if they have the same a
and b
value and are in different groups otherwise.
list(a, b)
# [[1]]
# [1] 1 1 1 1 1 2 2 2 2 2
# Levels: 1 2
#
# [[2]]
# [1] 1 1 2 2 3 3 4 4 5 5
# Levels: 1 2 3 4 5
We can see that the first two elements in x are going to be in group "1.1" (the group where a=1 and b=1), the next two will be in group 1.2, the next one will be in group 1.3, the next one will be in group 2.3, the next two will be in group 2.4, and the last two will be in group 2.5. This is exactly what we see when we call split(x, list(a, b))
:
split(x, list(a, b))
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`2.1`
# numeric(0)
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`2.2`
# numeric(0)
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`1.4`
# numeric(0)
# $`2.4`
# [1] 0.8391658 -1.0488495
# $`1.5`
# numeric(0)
# $`2.5`
# [1] 0.3141165 -1.1813052
The reason you have extra empty groups (e.g. group 2.1) is that a
and b
have some pairs of values where there are no x
values. From ?split
, you can read that the way to not include these in the output is with the drop=TRUE
option:
split(x, list(a, b), drop=TRUE)
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`2.4`
# [1] 0.8391658 -1.0488495
# $`2.5`
# [1] 0.3141165 -1.1813052
Upvotes: 2