TDo
TDo

Reputation: 744

R- Split + list function

Could anyone explain the split and list function in R? I am quite confused how to use them together. For example

x <- rnorm(10)
a <- gl(2,5)
b <- gl(5,2)
str(split(x,list(a,b))    

The result I get is

List of 10
$ 1.1: num [1:2] 0.1326 -0.0578
$ 2.1: num(0) 
$ 1.2: num [1:2] 0.151 0.907
$ 2.2: num(0) 
$ 1.3: num -0.393
$ 2.3: num 1.83
$ 1.4: num(0) 
$ 2.4: num [1:2] 0.4266 -0.0116
$ 1.5: num(0) 
$ 2.5: num [1:2] 0.62 1.64

How are values in x assigned to a level in list(a,b)? Why are there some levels without any values and some with many values? I do not see any relation between the values in x and the levels of list(a,b). Are they randomly assigned?

Really apreciate if someone could help me with this.

Upvotes: 0

Views: 172

Answers (1)

josliber
josliber

Reputation: 44299

When you call split(x, list(a, b)), you are basically saying that two x values are in the same group if they have the same a and b value and are in different groups otherwise.

list(a, b)
# [[1]]
#  [1] 1 1 1 1 1 2 2 2 2 2
# Levels: 1 2
# 
# [[2]]
#  [1] 1 1 2 2 3 3 4 4 5 5
# Levels: 1 2 3 4 5

We can see that the first two elements in x are going to be in group "1.1" (the group where a=1 and b=1), the next two will be in group 1.2, the next one will be in group 1.3, the next one will be in group 2.3, the next two will be in group 2.4, and the last two will be in group 2.5. This is exactly what we see when we call split(x, list(a, b)):

split(x, list(a, b))
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`2.1`
# numeric(0)
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`2.2`
# numeric(0)
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`1.4`
# numeric(0)
# $`2.4`
# [1]  0.8391658 -1.0488495
# $`1.5`
# numeric(0)
# $`2.5`
# [1]  0.3141165 -1.1813052

The reason you have extra empty groups (e.g. group 2.1) is that a and b have some pairs of values where there are no x values. From ?split, you can read that the way to not include these in the output is with the drop=TRUE option:

split(x, list(a, b), drop=TRUE)
# $`1.1`
# [1] -0.2431983 -1.5747339
# $`1.2`
# [1] -0.1058044 -0.8053585
# $`1.3`
# [1] -1.538958
# $`2.3`
# [1] 0.8363667
# $`2.4`
# [1]  0.8391658 -1.0488495
# $`2.5`
# [1]  0.3141165 -1.1813052

Upvotes: 2

Related Questions