Wilson Souza
Wilson Souza

Reputation: 860

Filter rows that are matched on multiple conditions

I have the following data:

dput(output[1:20])
structure(list(number_of_cols = c(20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 
20L), number_of_rows = c(17258L, 17258L, 20059L, 20059L, 16943L, 
16943L, 19090L, 19090L, 17846L, 17846L, 18879L, 18879L, 21076L, 
21076L, 19978L, 19978L, 16625L, 16660L, 15938L, 15938L), name_of_file = c("Basler_2020-12-01 00_34_52.441983_frames", 
"Basler_2020-12-01 00_34_52.441983_frames_automat", "Basler_2020-12-01 01_35_01.902191_frames", 
"Basler_2020-12-01 01_35_01.902191_frames_automat", "Basler_2020-12-01 02_35_11.367056_frames", 
"Basler_2020-12-01 02_35_11.367056_frames_automat", "Basler_2020-12-01 03_35_20.855642_frames", 
"Basler_2020-12-01 03_35_20.855642_frames_automat", "Basler_2020-12-01 04_35_30.251926_frames", 
"Basler_2020-12-01 04_35_30.251926_frames_automat", "Basler_2020-12-01 05_35_39.708837_frames", 
"Basler_2020-12-01 05_35_39.708837_frames_automat", "Basler_2020-12-01 06_35_49.255905_frames", 
"Basler_2020-12-01 06_35_49.255905_frames_automat", "Basler_2020-12-01 07_35_58.696052_frames", 
"Basler_2020-12-01 07_35_58.696052_frames_automat", "Basler_2020-12-01 18_04_05.227985_frames", 
"Basler_2020-12-01 18_04_05.227985_frames_automat", "Basler_2020-12-01 19_04_15.002675_frames", 
"Basler_2020-12-01 19_04_15.002675_frames_automat")), row.names = c(NA, 
-20L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001aed1e54460>)

To each 2 ordered rows, are correspondent files, the only difference in these files is the presence of the name "_automat" at the final of one of these files. However, some of these files have different rows. For example, the files in line 17 have 16625 rows and the file in line 18 has 16660 rows.

I would like to separate the lines of these data.frame() where the number of rows is different for the correspondent files (with and without "_automat" in the name) in another data.frame().

output expected:

structure(list(number_of_cols = c(20L, 20L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), 
    number_of_rows = c(17258L, 17258L, 20059L, 20059L, 16943L, 
    16943L, 19090L, 19090L, 17846L, 17846L, 18879L, 18879L, 21076L, 
    21076L, 19978L, 19978L, 15938L, 15938L), name_of_file = c("Basler_2020-12-01 00_34_52.441983_frames", 
    "Basler_2020-12-01 00_34_52.441983_frames_automat", "Basler_2020-12-01 01_35_01.902191_frames", 
    "Basler_2020-12-01 01_35_01.902191_frames_automat", "Basler_2020-12-01 02_35_11.367056_frames", 
    "Basler_2020-12-01 02_35_11.367056_frames_automat", "Basler_2020-12-01 03_35_20.855642_frames", 
    "Basler_2020-12-01 03_35_20.855642_frames_automat", "Basler_2020-12-01 04_35_30.251926_frames", 
    "Basler_2020-12-01 04_35_30.251926_frames_automat", "Basler_2020-12-01 05_35_39.708837_frames", 
    "Basler_2020-12-01 05_35_39.708837_frames_automat", "Basler_2020-12-01 06_35_49.255905_frames", 
    "Basler_2020-12-01 06_35_49.255905_frames_automat", "Basler_2020-12-01 07_35_58.696052_frames", 
    "Basler_2020-12-01 07_35_58.696052_frames_automat", "Basler_2020-12-01 19_04_15.002675_frames", 
    "Basler_2020-12-01 19_04_15.002675_frames_automat")), row.names = c(NA, 
-18L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001aed1e54460>)

and/or:

structure(list(number_of_cols = c(20L, 20L), number_of_rows = c(16625L, 
16660L), name_of_file = c("Basler_2020-12-01 18_04_05.227985_frames", 
"Basler_2020-12-01 18_04_05.227985_frames_automat")), row.names = c(NA, 
-2L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001aed1e54460>)

Thanks

Upvotes: 0

Views: 145

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6529

You could solve your problem as follow:

A[, same := uniqueN(number_of_rows)==1, by=sub("_automat$", "", name_of_file)]

df1 = A[(same),][, same := NULL]
df2 = A[!(same),][, same := NULL]
A[, same := NULL]

Upvotes: 1

Mohamed Desouky
Mohamed Desouky

Reputation: 4425

We can use dplyr package , where df1 is your first data.frame and df2 is the second

library(dplyr)

df1 <- df |> group_by(number_of_rows , number_of_cols) |> mutate(n = n()) |> filter(n == 2) |> select(-n)

df2 <- df |> group_by(number_of_rows , number_of_cols) |> mutate(n = n()) |> filter(n == 1) |> select(-n)

Upvotes: 1

Related Questions