alexvc
alexvc

Reputation: 67

How to create a column with numbered instance of occurence by ID and year

I am trying to find a way to create a column in my dataframe that will list out occurrences of each unique combination of personID and fiscal year.

I have a dataframe set up with variables like so:

Person.Id   Reported.Fiscal.Year  

250           2017
250           2017
250           2018
300           2018
511           2019
300           2018
700           2017

So in this example I want to create an additional column in the df above that has something like 'year' which would list year 1 for both occurrences of id 250 and year 2017, but would have year 2 for id 250 and fiscal year 2018. Like so:

Person.Id   Reported.Fiscal.Year    year

250           2017                   1
250           2017                   1
250           2018                   2
300           2018                   1
511           2019                   1
300           2018                   1
700           2017                   1

I've tried the following code:

df1 <- df1 %>% arrange(Person.Id,Reported.Fiscal.Year)

df2<- df1 %>% group_by(Person.Id,Reported.Fiscal.Year) %>% mutate(year=row_number())

But this results in a data frame that looks like this (essentially counting the occurrences of each year by ID):

Person.Id   Reported.Fiscal.Year    year

250           2017                   1
250           2017                   2
250           2018                   1
300           2018                   1
511           2019                   1
300           2018                   2
700           2017                   1

Upvotes: 1

Views: 313

Answers (3)

user13653858
user13653858

Reputation:

If I understand correctly, you want to enumarate the occurences of IDs accross the years?

I have used pieces of your code, you were close. Only you need to choose distinct rows to count the occurences with:

See comments inside the code:

library(dplyr)

# your example data
df <- read.table(header = TRUE, text = "
  Person.Id   Reported.Fiscal.Year  
  250           2017
  250           2017
  250           2018
  300           2018
  511           2019
  300           2018
  700           2017
")

# 1. arrange by ids and years (this is what you did)
# 2. group by ids to be able to count different fiscal years
# 3. choose only unique combinations of ids and fiscal years
# 4. use row numbers (as you did)
# 5. merge new column to original data
df %>%
  arrange(Person.Id, Reported.Fiscal.Year) %>%
  group_by(Person.Id) %>%
  distinct() %>%
  mutate(year = row_number()) %>%
  inner_join(df, .)
#> Joining, by = c("Person.Id", "Reported.Fiscal.Year")
#>   Person.Id Reported.Fiscal.Year year
#> 1       250                 2017    1
#> 2       250                 2017    1
#> 3       250                 2018    2
#> 4       300                 2018    1
#> 5       511                 2019    1
#> 6       300                 2018    1
#> 7       700                 2017    1

Created on 2020-07-06 by the reprex package (v0.3.0)

Upvotes: 2

RyanFrost
RyanFrost

Reputation: 1428

Here's an alternative to @Petr & @Bruno's very nice join-based solutions. This one works by building a cumulative count of unique years for each person.

library(readr)
df <- read_table("Person.Id   Reported.Fiscal.Year  

250           2017
250           2017
250           2018
300           2018
511           2019
300           2018
700           2017")

library(dplyr)
df %>%
  arrange(Person.Id, Reported.Fiscal.Year) %>%
  group_by(Person.Id) %>%
  mutate(year = cumsum(!duplicated(Reported.Fiscal.Year)))
#> # A tibble: 7 x 3
#> # Groups:   Person.Id [4]
#>   Person.Id Reported.Fiscal.Year  year
#>       <dbl>                <dbl> <int>
#> 1       250                 2017     1
#> 2       250                 2017     1
#> 3       250                 2018     2
#> 4       300                 2018     1
#> 5       300                 2018     1
#> 6       511                 2019     1
#> 7       700                 2017     1

Created on 2020-07-06 by the reprex package (v0.3.0)

Upvotes: 2

Bruno
Bruno

Reputation: 4150

Welcome to SO!

Had to summarise your data before, maybe someone can provide a simpler solution

library(tidyverse)

df_example <- read_table("Person.Id   Reported.Fiscal.Year  

250           2017
250           2017
250           2018
300           2018
511           2019
300           2018
700           2017")


df_example_summary <- df_example %>% 
  group_by(Person.Id,Reported.Fiscal.Year) %>% 
  summarise(number_reports = n(),,.groups = "drop_last") %>% 
  mutate(Year = row_number()) %>% 
  ungroup()

df_example %>% 
  left_join(df_example_summary)
#> Joining, by = c("Person.Id", "Reported.Fiscal.Year")
#> # A tibble: 7 x 4
#>   Person.Id Reported.Fiscal.Year number_reports  Year
#>       <dbl>                <dbl>          <int> <int>
#> 1       250                 2017              2     1
#> 2       250                 2017              2     1
#> 3       250                 2018              1     2
#> 4       300                 2018              2     1
#> 5       511                 2019              1     1
#> 6       300                 2018              2     1
#> 7       700                 2017              1     1

Created on 2020-07-06 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions