Filter rows that are matched on multiple conditions

Question

I have the following data:

dput(output[1:20])
structure(list(number_of_cols = c(20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L), number_of_rows = c(17258L, 17258L, 20059L, 20059L, 16943L, 
16943L, 19090L, 19090L, 17846L, 17846L, 18879L, 18879L, 21076L, 
21076L, 19978L, 19978L, 16625L, 16660L, 15938L, 15938L), name_of_file = c("Basler_2020-12-01 00_34_52.441983_frames", 
"Basler_2020-12-01 00_34_52.441983_frames_automat", "Basler_2020-12-01 01_35_01.902191_frames", 
"Basler_2020-12-01 01_35_01.902191_frames_automat", "Basler_2020-12-01 02_35_11.367056_frames", 
"Basler_2020-12-01 02_35_11.367056_frames_automat", "Basler_2020-12-01 03_35_20.855642_frames", 
"Basler_2020-12-01 03_35_20.855642_frames_automat", "Basler_2020-12-01 04_35_30.251926_frames", 
"Basler_2020-12-01 04_35_30.251926_frames_automat", "Basler_2020-12-01 05_35_39.708837_frames", 
"Basler_2020-12-01 05_35_39.708837_frames_automat", "Basler_2020-12-01 06_35_49.255905_frames", 
"Basler_2020-12-01 06_35_49.255905_frames_automat", "Basler_2020-12-01 07_35_58.696052_frames", 
"Basler_2020-12-01 07_35_58.696052_frames_automat", "Basler_2020-12-01 18_04_05.227985_frames", 
"Basler_2020-12-01 18_04_05.227985_frames_automat", "Basler_2020-12-01 19_04_15.002675_frames", 
"Basler_2020-12-01 19_04_15.002675_frames_automat")), row.names = c(NA, 
-20L), class = c("data.table", "data.frame"), .internal.selfref = )

To each 2 ordered rows, are correspondent files, the only difference in these files is the presence of the name "_automat" at the final of one of these files. However, some of these files have different rows. For example, the files in line 17 have 16625 rows and the file in line 18 has 16660 rows.

I would like to separate the lines of these data.frame() where the number of rows is different for the correspondent files (with and without "_automat" in the name) in another data.frame().

output expected:

structure(list(number_of_cols = c(20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), 
    number_of_rows = c(17258L, 17258L, 20059L, 20059L, 16943L, 
    16943L, 19090L, 19090L, 17846L, 17846L, 18879L, 18879L, 21076L, 
    21076L, 19978L, 19978L, 15938L, 15938L), name_of_file = c("Basler_2020-12-01 00_34_52.441983_frames", 
    "Basler_2020-12-01 00_34_52.441983_frames_automat", "Basler_2020-12-01 01_35_01.902191_frames", 
    "Basler_2020-12-01 01_35_01.902191_frames_automat", "Basler_2020-12-01 02_35_11.367056_frames", 
    "Basler_2020-12-01 02_35_11.367056_frames_automat", "Basler_2020-12-01 03_35_20.855642_frames", 
    "Basler_2020-12-01 03_35_20.855642_frames_automat", "Basler_2020-12-01 04_35_30.251926_frames", 
    "Basler_2020-12-01 04_35_30.251926_frames_automat", "Basler_2020-12-01 05_35_39.708837_frames", 
    "Basler_2020-12-01 05_35_39.708837_frames_automat", "Basler_2020-12-01 06_35_49.255905_frames", 
    "Basler_2020-12-01 06_35_49.255905_frames_automat", "Basler_2020-12-01 07_35_58.696052_frames", 
    "Basler_2020-12-01 07_35_58.696052_frames_automat", "Basler_2020-12-01 19_04_15.002675_frames", 
    "Basler_2020-12-01 19_04_15.002675_frames_automat")), row.names = c(NA, 
-18L), class = c("data.table", "data.frame"), .internal.selfref = )

and/or:

structure(list(number_of_cols = c(20L, 20L), number_of_rows = c(16625L, 
16660L), name_of_file = c("Basler_2020-12-01 18_04_05.227985_frames", 
"Basler_2020-12-01 18_04_05.227985_frames_automat")), row.names = c(NA, 
-2L), class = c("data.table", "data.frame"), .internal.selfref = )

Thanks

B. Christian Kamgang · Accepted Answer

You could solve your problem as follow:

A[, same := uniqueN(number_of_rows)==1, by=sub("_automat$", "", name_of_file)]

df1 = A[(same),][, same := NULL]
df2 = A[!(same),][, same := NULL]
A[, same := NULL]

Filter rows that are matched on multiple conditions

Answers (2)

Related Questions