Gabriele Filomena
Gabriele Filomena

Reputation: 13

Extract values from data frame, according to column names derived from a second data frame

I am working with two different data frames. In the first one, I have as rows names of geographic centroids (point), and as column names IDs of an other set of points (train stations). The values represent the distance between the centroids and the points in the column (extracted by gDistance).

From this one, I extracted the 3 closest points saving their IDs in a translated data frame. This one has now as row the centroids IDs and as column just the rank (1,2,3).

distances<- ft2miles(gDistance(stations, centroids, byid=TRUE))
access <-as.data.frame(apply(distances, 1, function(X) colnames(distances)[order(X)][1:3]))
access <- as.data.frame(t(access))

I would like to add to the last data frame three columns with the corresponding distance values. To be more clear, the columns of the distances data frame, correspond to the values of the second one, row names are identical.

First data frame:

                   158      1559     1560     1561      1722      1723     1732 166516384      1546
croatia_new.0 1.108585 0.9434521 1.138825 1.621118 0.8457217 0.5554764 2.441048 0.7393321 6.4736425
croatia_new.1 1.107511 1.3381573 2.970976 3.448820 2.6142722 2.0175318 4.167618 1.6984548 6.2927608

Second data frame:

              closest1  closest2  closest3
croatia_new.0     1723 166516384      1722
croatia_new.1      158      1559 166516384
croatia_new.2     1546      1584  98349400
croatia_new.3     1584      1546  98349400

The values of this data frame are essentially the column names of the first data frame, indicating the ID of the closest stations. What I want to obtain:

                closest1    closest2    closest3
croatia_new.0   0.5554764   0.739332    0.8457217
croatia_new.1   1.107511    1.3381573   1.6984548
croatia_new.N    ....         ....       ....

Output obtained using DigEmAll solution

                     V1       V2       V3
croatia_new.0  3.719507       NA       NA
croatia_new.1 63.082290 38.04791 36.94511
croatia_new.2        NA       NA 49.05636
croatia_new.3 55.836949 47.74032 46.48799
croatia_new.4 41.484741       NA 52.85941
croatia_new.5 60.654512       NA 49.68948

console output:

> dput(distances["croatia_new.2",1:10])
structure(list(`158` = 6.99119002871976, `1559` = 7.05366412218041, 
`1560` = 6.99165462844427, `1561` = 6.79651949339854, `1722` =     7.09268368796389, 
    `1723` = 7.06552376797322, `1732` = 6.3722523477356, `166516384` = 7.1279696005328, 
`1546` = 0.153911810037831, `1584` = 0.433837036180278), .Names = c("158", 
"1559", "1560", "1561", "1722", "1723", "1732", "166516384", 
"1546", "1584"), row.names = "croatia_new.2", class = "data.frame")

dput(access["croatia_new.2",])

structure(list(V1 = structure(149L, .Names = "croatia_new.2", .Label =    c("1001", 
"1002", "1003", "1004", "1005", "1006", "1007", "1018", "1019", 
"1020", "1023", "1024", "1025", "1031", "1038", "1042", "1045", 
"1046", "1047", "1048", "1050", "1052", "1053", "1054", "1056", 
"105753770", "1059", "1061", "1062", "1064", "1065", "1067", 
"1073", "1076", "1085", "1088", "109", "1091", "1093", "1095", ....."999"),   class = "factor")), .Names = c("V1", 
"V2", "V3"), row.names = "croatia_new.2", class = "data.frame")

Upvotes: 0

Views: 1157

Answers (1)

digEmAll
digEmAll

Reputation: 57210

I think you want something like this :

### RECREATE THE INPUT EXAMPLE
DF1 <- read.csv(text=
"158,1559,1560,1561,1722,1723,1732,166516384,1546
croatia_new.0,1.108585,0.9434521,1.138825,1.621118,0.8457217,0.5554764,2.441048,0.7393321,6.4736425
croatia_new.1,1.107511,1.3381573,2.970976,3.448820,2.6142722,2.0175318,4.167618,1.6984548,6.2927608",
check.names=FALSE
)
DF2 <- read.csv(text=
"closest1,closest2,closest3
croatia_new.0,1723,166516384,1722
croatia_new.1,158,1559,166516384
croatia_new.2,1546,1584,98349400
croatia_new.3,1584,1546,98349400"
)

# ensure the values of DF2 are characters
DF2[] <- lapply(DF2, as.character)

# for each row of DF2, we search for the value in DF1 using the
# row names and column names stored in DF2.
# The result will be a list of numeric values.
rows2bind <- 
  lapply(1:nrow(DF2),function(i){
    station <- row.names(DF2)[i]
    row <- DF2[i,]
    row2 <- as.numeric(DF1[match(station,row.names(DF1)),])
    return(row2[match(as.character(row),colnames(DF1))])
  })

# we turn the list into a data.frame similar to DF2 but with values instead of column names
DF2ext <- setNames(as.data.frame(do.call(rbind,rows2bind),row.names=row.names(DF2)), colnames(DF2))

# we add the closest columns to DF1 
DF1ext <- cbind(DF1,DF2ext[match(row.names(DF1),row.names(DF2ext)),])

Results :
(note the last 3 columns of DF1ext)

> DF2ext
               closest1  closest2  closest3
croatia_new.0 0.5554764 0.7393321 0.8457217
croatia_new.1 1.1075110 1.3381573 1.6984548
croatia_new.2        NA        NA        NA
croatia_new.3        NA        NA        NA

> DF1ext
                   158      1559     1560     1561      1722      1723     1732 166516384     1546  closest1  closest2  closest3
croatia_new.0 1.108585 0.9434521 1.138825 1.621118 0.8457217 0.5554764 2.441048 0.7393321 6.473643 0.5554764 0.7393321 0.8457217
croatia_new.1 1.107511 1.3381573 2.970976 3.448820 2.6142722 2.0175318 4.167618 1.6984548 6.292761 1.1075110 1.3381573 1.6984548

Upvotes: 2

Related Questions