RxT
RxT

Reputation: 546

How to identify an ID with values in at least one column for all rows?

I have a data.frame with the cloumns: id, name, year, rop, des, cax, pcld.

I need to identify the id with the value "1" in at least one column (rop, des, cax e pcld) for all years (2014, 2015, 2016, 2017). After that, I need to identify those that have been discontinued over the years (Ex: 2014 = 1, 2015 = 1, 2016 = 0).

So, as a first step, I thought about transposing the data.frame, to look like this:

 id | name | conta | 2014 | 2015 | 2016 | 2017
 
 3243 | teste| rop   |   1  |   0  |  0    | 1
 3243 | teste| des   |   0  |   1  |  0    | 0
 3243 | teste| cax   |   0  |   0  |  1    | 0
 3243 | teste| pcld  |   0  |   0  |  0    | 0

In the next step, id "3243" would be identified. It has at least one value "1" in all years (columns).

I have been trying to transpose:

library(magrittr)

dados   %>% dplyr::select(id, name, year, rop, des, cax, pcld) %>% 
            tidyr::pivot_longer(
                                 cols = -c('id', 'name'),
                                 names_to = 'Conta'

                                ) 
 

Data:

structure(list(
    id = c(1111, 1111, 1111, 1111, 22222,22222, 22222, 22222, 33333, 33333,3243,3243,3243,3243), 
    name = c("empresa", "empresa", "empresa", "empresa","firma", "firma", "firma", "firma", "loja", "loja","teste","teste","teste","teste"), 
    year = c(2014, 2015, 2016, 2017, 2014, 2015, 2016, 2017, 2014,2015,2014, 2015, 2016, 2017),
    rop = c(1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1), 
    des = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), 
    cax = c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0), 
    pcld = c(1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0)), 
    row.names = c(NA,14L), class = "data.frame")

There is a better way to do it?

Upvotes: 0

Views: 456

Answers (2)

Marcos Pérez
Marcos Pérez

Reputation: 1250

Try this:

 df %>% group_by(id,year) %>% 
        rowwise() %>%
        mutate(C=max(rop,des,cax,pcld)) %>%
        group_by(id) %>%
        filter(sum(C)==4) %>% 
        select(-C) %>%
        ungroup

Upvotes: 2

akrun
akrun

Reputation: 887941

if we want to subset the 'id' where at least one of the column from 'rop' to 'pcld' have 1 in its column for all rows, then do a group by 'id' and filter by creating a logical vector with rowSums, and wrap with all

library(dplyr)
dados %>% 
   group_by(id) %>% 
   filter(all(rowSums(select(cur_data(), rop:pcld)) > 0)) %>%
   ungroup

Or another option is if_any (dplyr version 1.0.4) with all

dados %>%
    group_by(id) %>%
    filter(all(if_any(rop:pcld, as.logical)))

i.e. if_any loops over the columns rop to pcld, convert the binary to logical (as.logical), returns TRUE/FALSE as a vector if any of the columns for a row have 1, then wrap the logical vector with all so that it returns a single TRUE (when all rows have at least one 1 for any column) or FALSE (when any row doesn't have a 1 for all the columns) to subset the 'id'

Upvotes: 1

Related Questions