Student0172
Student0172

Reputation: 93

How to identify specific rows in R (based on other values)

For my dataset I want a row for each year for each ID and I want to determine if they lived in an urban area or not (0/1). Because some ID’s moved within a year and therefore have two rows for that year, I want to identify if they have two rows for that specific year, which mean they lived in an urban and non-urban area in that year (so I can manually determine in Excel at where they belong).

I’ve already excluded the exact double rows (so they moved in a certain year, but the urbanisation didn’t change).

df <- df %>% distinct(ID, YEAR, URBAN, .keep_all = TRUE)




structure(t2A)
    # A tibble: 3,177,783 x 4
       ID ZIPCODE YEAR URBAN
            <dbl> <chr>  <chr> <dbl>
     1   1 1234AB   2013     0
     2   1 1234AB   2014     0
     3   1 1234AB   2015     0
     4   1 1234AB   2016     0
     5   1 1234AB   2017     0
     6   1 1234AB   2018     0
     7   2 5678CD   2013     0
     8   2 5678CD   2014     0
     9   2 5678CD   2015     0
    10   2 5678CD   2016     0
    # ... with 3,177,773 more rows

structure(list(ID= c(1, 1, 1, 1
), YEAR = c("2013", "2014", "2015", "2016"), URBAN = c(0, 
0, 0, 0)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

Can you guys help me with identifying ID’s that have two rows for a specific year/have a 0 and 1 in the same year?

Edit: the example doesn't show any ID's with urbanisation 1, but there are and not all ID's are included all years :)

Upvotes: 0

Views: 209

Answers (2)

rj-nirbhay
rj-nirbhay

Reputation: 679

Below might be useful:

df <- df %>%
  dplyr::group_by(ID, YEAR) %>%
  dplyr::mutate(nIds=dplyr::n(),#count the occurance at unique ID and year combination
         URBAN_Flag=sum(URBAN), ##Urban flag for those who are from urban
         moved=dplyr::if_else(nIds>1,1,0)) %>% 
  dplyr::select(-c(nIds))

You can deselect the columns if not needed

Upvotes: 1

edsandorf
edsandorf

Reputation: 767

First, we create some dummy data

library(tidyverse)

db <- tibble(
  id = c(1, 1, 1, 2, 2, 2),
  year = c(2000, 2000, 2001, 2001, 2002, 2003),
  urban = c(0, 1, 0, 0, 0, 0)
)

We see that person one moved in 2000.

     id  year urban
  <dbl> <dbl> <dbl>
1     1  2000     0
2     1  2000     1
3     1  2001     0
4     2  2001     0
5     2  2002     0
6     2  2003     0

Now, we can group by id and year and count the number of rows. We can use the count value to create a dummy whether or not they moved in a given year.

db %>%
  group_by(id, year) %>%
  summarize(rows = n()) %>%
  mutate(
    moved = ifelse(rows == 2, 1, 0)
  )

Which gives the result:

     id  year  rows moved
  <dbl> <dbl> <int> <dbl>
1     1  2000     2     1
2     1  2001     1     0
3     2  2001     1     0
4     2  2002     1     0
5     2  2003     1     0

Upvotes: 0

Related Questions