user51966
user51966

Reputation: 1057

R: Using data.table inside the function of apply()

I have distance matrix and each row is an individual, and each column is a facility. The cell shows the length from an individual to the facility.

> head(ODMatrix, 5)
   toFacility1 toFacility2 toFacility3 toFacility4 toFacility5 toFacility6 toFacility7 toFacility8 toFacility9 toFacility10
1:    4154.229    1835.176    5228.835    8093.985   7813.0557    2396.326    4055.081    4199.636    6790.750     4206.637
2:    4075.044    4848.875    3403.399    2575.370    501.4027    1072.520    1860.508    3188.388    2639.671     6118.273
3:    5660.299    3767.281    7249.469    4276.207   1917.6547    1288.333    3956.757    4511.083    1576.480     4940.198
4:    6853.425    1385.334    8696.045    7012.102   3201.9396    1708.367    4052.216    5352.751    5315.842     3218.540
5:    6746.253    1735.916    8397.047    5014.986   4820.9541    1681.347    3728.737    5334.818    6826.545     2085.071

Some of the facilities are stations and some of the facilities are poll stations. I want to know which minimum distance is shorter. Facility 1, 2, and 3 are stations, so station_col_numbers <- c(1,2,3). Other facilities are poll stations.
For example, in the case of the first row, the nearest station for him is Faciity2 (1835.176m), and the closest poll station for him is Facility6 (2396.326). Then, what I actually want to know is which one is closer. In this case, since 1835.176 < 2396.326, the station is closer for him, so 0 is the dummy variable for this row.

analyse <- function(row_I){
  row_I_withoutStation <- row_I[ , -station_col_numbers, with=F]
  row_I_ToStation <- row_I[ , station_col_numbers, with=F]
  toStation_min <- min(row_I_ToStation) 
  toPollStation_min <- min(col_I_withoutStation)

  if (toStation_min >= toPollStation_min){
    return(1) 
  }else{
    return(0) 
  }
}

However, when I use apply(), it fails.

Dummy <- apply(ODMatrix, 1, analyse)
 Error in row_I[, -station_col_numbers, with = F] : 
 incorrect number of dimensions

Is this a misuse of apply()? How can I solve it?

Upvotes: 1

Views: 91

Answers (2)

Jaap
Jaap

Reputation: 83215

In base R you can create a logical integer vector indicating whether a polling station is closest with:

ODMatrix$poll.closest <- +(apply(ODMatrix[,1:3], 1, min) > apply(ODMatrix[,4:10], 1, min))

which gives:

> ODMatrix
   toFacility1 toFacility2 toFacility3 toFacility4 toFacility5 toFacility6 toFacility7 toFacility8 toFacility9 toFacility10 poll.closest
1:    4154.229    1835.176    5228.835    8093.985   7813.0557    2396.326    4055.081    4199.636    6790.750     4206.637            0
2:    4075.044    4848.875    3403.399    2575.370    501.4027    1072.520    1860.508    3188.388    2639.671     6118.273            1
3:    5660.299    3767.281    7249.469    4276.207   1917.6547    1288.333    3956.757    4511.083    1576.480     4940.198            1
4:    6853.425    1385.334    8696.045    7012.102   3201.9396    1708.367    4052.216    5352.751    5315.842     3218.540            0
5:    6746.253    1735.916    8397.047    5014.986   4820.9541    1681.347    3728.737    5334.818    6826.545     2085.071            1

With data.table you could do:

stations <- names(ODMatrix)[1:3]
pollstations <- names(ODMatrix)[4:10]
ODMatrix[, idx:=.I
         ][, dist.station := min(.SD), idx, .SDcols=stations
           ][, dist.poll := min(.SD), idx, .SDcols=pollstations
             ][, poll.closest := +(dist.station > dist.poll)
               ][, c("idx","dist.station","dist.poll"):=NULL]

to get the same result. Alternatively, you could also use:

ODMatrix[, poll.closest := pmin(toFacility1,toFacility2,toFacility3) >
           pmin(toFacility4,toFacility5,toFacility6,toFacility7,toFacility8,toFacility9,toFacility10),
         by = 1:nrow(ODMatrix)]

Upvotes: 1

Robert
Robert

Reputation: 5152

Modify your function, has some typos/error:

  analyse <- function(row_I){ #row_I=ODMatrix[1,] 
  col_I_withoutStation <- row_I[ -station_col_numbers]
  col_I_ToStation <- row_I[ station_col_numbers]
  toStation_min <- min(col_I_ToStation) 
  toPollStation_min <- min(col_I_withoutStation)
  #cat(toStation_min , toPollStation_min)
  if (toStation_min >= toPollStation_min){
    return(1) 
  }else{
    return(0) 
  }
}

apply(ODMatrix, 1, analyse)

You wil get

[1] 0 1 1 0 1

Upvotes: 1

Related Questions