Subset dataframe based on the condition in a column of another dataframe

Question

I have two data frames where each line represent data from one individual. Lines in the first data frame (that enter the specific analysis of geometric morphometry) correspond to the lines in the second data frame (additional descriptions of animals as sampling site or sex). I would like to subset the first data frame based on the condition form the second data frame (e.g. select all lines of the first data frame that are females, but sex of the animal is defined in the second dataframe). It is possible to do it by adding new column to the first data frame, subset it based on this new column and remove the column. Is there any other more elegant way to do it?

df1
                [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079
IMGP7012.JPG -0.07059829 0.07905529 0.021803102 0.07480276 0.04849282 0.07270644
IMGP7013.JPG -0.07176010 0.08561111 0.009568661 0.08297752 0.06374573 0.08272648
IMGP7014.JPG -0.06751993 0.08895038 0.016800152 0.08799522 0.04776876 0.08100145
IMGP7015.JPG -0.07945826 0.07844136 0.008176800 0.07431915 0.06471417 0.07348312
IMGP7017.JPG -0.06587874 0.09280032 0.010204330 0.09085868 0.05290771 0.08739235

df2
    number site     m m..evis. m..gonads. sex  SL       TL       AP       RP
37     10   KB 1.263    1.003      0.136   F 39.38949 47.72564       NA       NA
38     11   KB 4.215    3.510      0.093   F 53.48064 65.29663       NA       NA
39     12   KB 3.508    2.997      0.079   F 51.59589 64.76600       NA       NA
40     13   KB 3.250    2.752      0.085   F 49.55853 61.74319       NA       NA
41     14   KB 3.596    3.149      0.101   F 51.42303 64.79511       NA       NA
42     10  KKB 3.257    2.451      0.270   M 55.07909 67.52057 1468.017 598.9462
43     11  KKB 3.493    2.275      0.666   M 54.24882 65.61726 1722.414 757.1050
44     12  KKB 3.066    2.210      0.300   M 53.56323 64.09848 1410.891 638.4123
45     13  KKB 3.294    2.193      0.652   M 51.66717 63.49136 1428.063 651.1915
46     14  KKB 2.803    1.871      0.582   M 50.91185 60.90951 1236.438 660.8433

df1 after subset

                    [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079

www · Accepted Answer

df1[df2$sex %in% "F", ]
#                     [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
# IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
# IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
# IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
# IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
# IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079

Explanation

Your df1 looks like a matrix, not a data.frame. But the solution I provided will also work if df1 is a data frame.

df2$sex %in% "F" reports if sex matches F. and reports a logical vector with TRUE and FALSE. After that, you can use that to subset df1.

Data

df1 <- matrix(c(-0.07612235, 0.08189661, 0.020690012, 0.07532420, 0.05373111, 0.07139840,
                -0.06759482, 0.09449720, 0.022907275, 0.08807724, 0.05953926, 0.08256468,
                -0.06902234, 0.08418980, 0.013522385, 0.08186618, 0.05375763, 0.07769076,
                -0.07201136, 0.08475765, 0.009462017, 0.08080315, 0.06148776, 0.07059229,
                -0.08112908, 0.08485488, 0.037193459, 0.07971364, 0.05834018, 0.07917079,
                -0.07059829, 0.07905529, 0.021803102, 0.07480276, 0.04849282, 0.07270644,
                -0.07176010, 0.08561111, 0.009568661, 0.08297752, 0.06374573, 0.08272648,
                -0.06751993, 0.08895038, 0.016800152, 0.08799522, 0.04776876, 0.08100145,
                -0.07945826, 0.07844136, 0.008176800, 0.07431915, 0.06471417, 0.07348312,
                -0.06587874, 0.09280032, 0.010204330, 0.09085868, 0.05290771, 0.08739235),
ncol = 6, byrow = TRUE)

rownames(df1) <- c("IMGP6995.JPG", "IMGP6997.JPG", "IMGP6998.JPG", "IMGP6999.JPG",
                   "IMGP7001.JPG", "IMGP7012.JPG", "IMGP7013.JPG", "IMGP7014.JPG",
                   "IMGP7015.JPG", "IMGP7017.JPG")

df2 <- read.table(text = "    number site     m m..evis. m..gonads. sex  SL       TL       AP       RP
37     10   KB 1.263    1.003      0.136   F 39.38949 47.72564       NA       NA
                  38     11   KB 4.215    3.510      0.093   F 53.48064 65.29663       NA       NA
                  39     12   KB 3.508    2.997      0.079   F 51.59589 64.76600       NA       NA
                  40     13   KB 3.250    2.752      0.085   F 49.55853 61.74319       NA       NA
                  41     14   KB 3.596    3.149      0.101   F 51.42303 64.79511       NA       NA
                  42     10  KKB 3.257    2.451      0.270   M 55.07909 67.52057 1468.017 598.9462
                  43     11  KKB 3.493    2.275      0.666   M 54.24882 65.61726 1722.414 757.1050
                  44     12  KKB 3.066    2.210      0.300   M 53.56323 64.09848 1410.891 638.4123
                  45     13  KKB 3.294    2.193      0.652   M 51.66717 63.49136 1428.063 651.1915
                  46     14  KKB 2.803    1.871      0.582   M 50.91185 60.90951 1236.438 660.8433",
                  header = TRUE, stringsAsFactors = FALSE)

Subset dataframe based on the condition in a column of another dataframe

Answers (1)

Related Questions