Peter Chung
Peter Chung

Reputation: 109

Sorting data in file based on first column in another file

I would like to rearrange dataframe1 by the order of the dataframe2 column.

Dataframe1:

        N02_M   N05_F   N06_M   N07_F   N08_F   N09_M   N02_M   N026_F  N03_M
2237895 0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225
586     0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364
2255280 0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995
7294280 0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478
7499    0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803
35209   0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94

Here is data frame 2:

Dataframe2:

586
2237895
7499
35209
2255280
7294280

Result:

        N02_M   N05_F   N06_M   N07_F   N08_F   N09_M   N02_M   N026_F  N03_M
586     0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364
2237895 0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225
7499    0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803
35209   0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94
2255280 0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995
7294280 0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478

I tried the command awk

'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' df2 df1 

but it doesn't work, it doesn't change the order of the df2

Upvotes: 0

Views: 166

Answers (3)

Ed Morton
Ed Morton

Reputation: 203645

$ cat tst.awk
NR==FNR {
    if (NR==1) {
        print
    }
    else {
        map[$1] = $0
    }
    next
}
{ print map[$1] }

$ awk -f tst.awk dataframe1 dataframe2
        N02_M   N05_F   N06_M   N07_F   N08_F   N09_M   N02_M   N026_F  N03_M
586     0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364  0.8364
2237895 0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225  0.6225
7499    0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803   0.803
35209   0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94    0.94
2255280 0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995   0.995
7294280 0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478  0.8478

Upvotes: 1

user31264
user31264

Reputation: 6727

Let us prepare the data.

Dataframe1 <- cbind(a=rnorm(6), b=rnorm(6))
rownames(Dataframe1) <- round(rnorm(6)*1000000)
Dataframe2 <- cbind(sample(as.integer(rownames(Dataframe1))))

Dataframe1:

                 a            b
255728   -0.2063562 -0.804514251
215067    0.2756317 -0.865951064
2079729   1.1376277  0.001123908
-898713   0.1846505 -0.908527352
-717953   1.7424731  0.462088593
-1028794  0.4162211  1.548455286

Dataframe2:

         [,1]
[1,] -1028794
[2,]  -898713
[3,]  -717953
[4,]   215067
[5,]  2079729
[6,]   25572

And now... Hocus Pocus!

Dataframe1[order(rownames(Dataframe1))[rank(Dataframe2)],]
                  a            b
-1028794  0.4162211  1.548455286
2079729   1.1376277  0.001123908
215067    0.2756317 -0.865951064
255728   -0.2063562 -0.804514251
-898713   0.1846505 -0.908527352
-717953   1.7424731  0.462088593

The idea for such sorting is to use order(x)[rank(y)].

Upvotes: 0

Adam Quek
Adam Quek

Reputation: 7153

Here's a solution with dplyr

library(dplyr)
df1$id <- rownames(df1) # create common column "id" based on rownames

names(df2) <- "id" #change column name of df2 to common name "id"
df2$id <- as.character(df2$id) #ensure that the structure for df1$id and df2$id are in similar structure


left_join(df2, df1)

        id  N02_M  N05_F  N06_M  N07_F  N08_F  N09_M N02_M.1 N026_F  N03_M
1    34387     NA     NA     NA     NA     NA     NA      NA     NA     NA
2   202783     NA     NA     NA     NA     NA     NA      NA     NA     NA
3   887320     NA     NA     NA     NA     NA     NA      NA     NA     NA
4    35444     NA     NA     NA     NA     NA     NA      NA     NA     NA
5    88035     NA     NA     NA     NA     NA     NA      NA     NA     NA
6   239862     NA     NA     NA     NA     NA     NA      NA     NA     NA
7  9266359     NA     NA     NA     NA     NA     NA      NA     NA     NA
8      586 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364  0.8364 0.8364 0.8364
9   375535     NA     NA     NA     NA     NA     NA      NA     NA     NA
10 3794260     NA     NA     NA     NA     NA     NA      NA     NA     NA
11 6687758     NA     NA     NA     NA     NA     NA      NA     NA     NA
12 6983267     NA     NA     NA     NA     NA     NA      NA     NA     NA
13  704346     NA     NA     NA     NA     NA     NA      NA     NA     NA
14   35209 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400  0.9400 0.9400 0.9400
15  795668     NA     NA     NA     NA     NA     NA      NA     NA     NA

By the way, df1 == Dataframe1 and df2 == Dataframe2

Upvotes: 0

Related Questions