Student0172
Student0172

Reputation: 93

How to select (four) specific rows (multiple times) based on a column value in R?

I only want to select the ID's that are in my dataframe for all years, from 2013 untill 2016 (so four times). In that case ID's with only four rows are left (panel data, each ID has 1 row for each year). I already made sure my dataframe only covers the years I need (2013, 2014, 2015, and 2016), but I want to exclude the ID's that have less than 4 years/rows in my dataframe.

This is the structure of my dataframe:

 tibble [909,587 x 26] (S3: tbl_df/tbl/data.frame)
     $ ID                         : num [1:909587] 12 12 12 12 16 16 16 16...
     $ Gender                     : num [1:909587] 2 2 2 2 1 1 1 1 1 1 ...
      ..- attr(*, "format.spss")= chr "F10.0"
     $ Year                       : chr [1:909587] "2016" "2013" "2014" "2015" ...
      ..- attr(*, "format.spss")= chr "F9.3"
     $ Size                       : num [1:909587] 1983 1999 1951 1976 902 ...
     $ Costs                      : num [1:909587] 2957.47 0 0.34 1041.67 0 ...
     $ Urbanisation               : num [1:909587] 2 3 3 2 3 3 2 2 2 3 ...
     $ Age                        : num [1:909587] 92 89 90 91 82 83 22 23 24 65 ...

How can I achieve that?

Thank you!

Upvotes: 1

Views: 693

Answers (2)

Chuck P
Chuck P

Reputation: 3923

Just to capture @Jasonaizkains answer from the comments field above, since pivoting is not strictly necessary in this case with some play data.

library(dplyr)
id <- rep(10:13, 4) # four subjects
year <- rep(2013:2016, each = 4) # four years
gender <- sample(1:2, 16, replace = TRUE)
play <- tibble(id, gender, year) # data.frame of 16

play <- play[-9,] # removes row for id 10 in 2015

# Removes all entries for the right id number
play %>% group_by(id) %>% filter(n_distinct(year) >= 4) %>% ungroup()
#> # A tibble: 12 x 3
#>       id gender  year
#>    <int>  <int> <int>
#>  1    11      1  2013
#>  2    12      2  2013
#>  3    13      2  2013
#>  4    11      1  2014
#>  5    12      2  2014
#>  6    13      1  2014
#>  7    11      2  2015
#>  8    12      2  2015
#>  9    13      2  2015
#> 10    11      2  2016
#> 11    12      2  2016
#> 12    13      1  2016

Upvotes: 2

Bruno
Bruno

Reputation: 4150

Pivot your df

df %>% pivot_wider(names_from = Year,values_from = Age)

Filter the na's rows out of columns 2013,2014,2015,2016

Pivot back

df %>% pivot_longer(2013:2016)

Upvotes: 2

Related Questions