Reputation: 11
I have two data frames where each line represent data from one individual. Lines in the first data frame (that enter the specific analysis of geometric morphometry) correspond to the lines in the second data frame (additional descriptions of animals as sampling site or sex). I would like to subset the first data frame based on the condition form the second data frame (e.g. select all lines of the first data frame that are females, but sex of the animal is defined in the second dataframe). It is possible to do it by adding new column to the first data frame, subset it based on this new column and remove the column. Is there any other more elegant way to do it?
df1
[,1] [,2] [,3] [,4] [,5] [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079
IMGP7012.JPG -0.07059829 0.07905529 0.021803102 0.07480276 0.04849282 0.07270644
IMGP7013.JPG -0.07176010 0.08561111 0.009568661 0.08297752 0.06374573 0.08272648
IMGP7014.JPG -0.06751993 0.08895038 0.016800152 0.08799522 0.04776876 0.08100145
IMGP7015.JPG -0.07945826 0.07844136 0.008176800 0.07431915 0.06471417 0.07348312
IMGP7017.JPG -0.06587874 0.09280032 0.010204330 0.09085868 0.05290771 0.08739235
df2
number site m m..evis. m..gonads. sex SL TL AP RP
37 10 KB 1.263 1.003 0.136 F 39.38949 47.72564 NA NA
38 11 KB 4.215 3.510 0.093 F 53.48064 65.29663 NA NA
39 12 KB 3.508 2.997 0.079 F 51.59589 64.76600 NA NA
40 13 KB 3.250 2.752 0.085 F 49.55853 61.74319 NA NA
41 14 KB 3.596 3.149 0.101 F 51.42303 64.79511 NA NA
42 10 KKB 3.257 2.451 0.270 M 55.07909 67.52057 1468.017 598.9462
43 11 KKB 3.493 2.275 0.666 M 54.24882 65.61726 1722.414 757.1050
44 12 KKB 3.066 2.210 0.300 M 53.56323 64.09848 1410.891 638.4123
45 13 KKB 3.294 2.193 0.652 M 51.66717 63.49136 1428.063 651.1915
46 14 KKB 2.803 1.871 0.582 M 50.91185 60.90951 1236.438 660.8433
df1 after subset
[,1] [,2] [,3] [,4] [,5] [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079
Upvotes: 0
Views: 1170
Reputation: 39174
df1[df2$sex %in% "F", ]
# [,1] [,2] [,3] [,4] [,5] [,6]
# IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
# IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
# IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
# IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
# IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079
Explanation
Your df1
looks like a matrix
, not a data.frame
. But the solution I provided will also work if df1
is a data frame.
df2$sex %in% "F"
reports if sex
matches F
. and reports a logical vector with TRUE
and FALSE
. After that, you can use that to subset df1
.
Data
df1 <- matrix(c(-0.07612235, 0.08189661, 0.020690012, 0.07532420, 0.05373111, 0.07139840,
-0.06759482, 0.09449720, 0.022907275, 0.08807724, 0.05953926, 0.08256468,
-0.06902234, 0.08418980, 0.013522385, 0.08186618, 0.05375763, 0.07769076,
-0.07201136, 0.08475765, 0.009462017, 0.08080315, 0.06148776, 0.07059229,
-0.08112908, 0.08485488, 0.037193459, 0.07971364, 0.05834018, 0.07917079,
-0.07059829, 0.07905529, 0.021803102, 0.07480276, 0.04849282, 0.07270644,
-0.07176010, 0.08561111, 0.009568661, 0.08297752, 0.06374573, 0.08272648,
-0.06751993, 0.08895038, 0.016800152, 0.08799522, 0.04776876, 0.08100145,
-0.07945826, 0.07844136, 0.008176800, 0.07431915, 0.06471417, 0.07348312,
-0.06587874, 0.09280032, 0.010204330, 0.09085868, 0.05290771, 0.08739235),
ncol = 6, byrow = TRUE)
rownames(df1) <- c("IMGP6995.JPG", "IMGP6997.JPG", "IMGP6998.JPG", "IMGP6999.JPG",
"IMGP7001.JPG", "IMGP7012.JPG", "IMGP7013.JPG", "IMGP7014.JPG",
"IMGP7015.JPG", "IMGP7017.JPG")
df2 <- read.table(text = " number site m m..evis. m..gonads. sex SL TL AP RP
37 10 KB 1.263 1.003 0.136 F 39.38949 47.72564 NA NA
38 11 KB 4.215 3.510 0.093 F 53.48064 65.29663 NA NA
39 12 KB 3.508 2.997 0.079 F 51.59589 64.76600 NA NA
40 13 KB 3.250 2.752 0.085 F 49.55853 61.74319 NA NA
41 14 KB 3.596 3.149 0.101 F 51.42303 64.79511 NA NA
42 10 KKB 3.257 2.451 0.270 M 55.07909 67.52057 1468.017 598.9462
43 11 KKB 3.493 2.275 0.666 M 54.24882 65.61726 1722.414 757.1050
44 12 KKB 3.066 2.210 0.300 M 53.56323 64.09848 1410.891 638.4123
45 13 KKB 3.294 2.193 0.652 M 51.66717 63.49136 1428.063 651.1915
46 14 KKB 2.803 1.871 0.582 M 50.91185 60.90951 1236.438 660.8433",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2