Waldemar Ortiz
Waldemar Ortiz

Reputation: 75

How can I automate age class advancements in R

Sorry if this a simple question, I am fairly new to R and still trying to grasp some of these concepts. I am having problems doing automated age class advancements in R and I was wondering if I could possibly get some help as to how to address this.

Currently, I am attempting to use if-else statements to solve my problem but I feel that I am shooting in the dark on how to format it correctly. Basically, what I need is for my code to recognize the season of the observation, if the season is not three, then the output should be the original age class (if its the first observation) or the prior observation's age class.

If the season is 3, then I would need to have an age class advancement. For example, if the individual was a yearling in the prior observation, the next season 3 entry would shift the individual's age class from Yearling to Adult. But, if the individual was an adult, then the age class would remain the same.

Below is an example of what I would need the data to look like.

+----+--------+--------------------+----------------+
| ID | Season | Original Age Class | Desired Output |
+----+--------+--------------------+----------------+
|  1 |      1 | New_Born           | New_Born       |
|  1 |      2 |                    | New_Born       |
|  1 |      3 |                    | Yearling       |
|  1 |      4 |                    | Yearling       |
|  1 |      1 |                    | Yearling       |
|  1 |      2 |                    | Yearling       |
|  1 |      3 |                    | Adult          |
|  1 |      4 |                    | Adult          |
|  1 |      1 |                    | Adult          |
|  1 |      2 |                    | Adult          |
+----+--------+--------------------+----------------+

I would appreciate any help with my problem and I thank you in advance.

Upvotes: 0

Views: 77

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28695

If you have a data frame of IDs and seasons as in your question, and an ordered vector of age classes, as below:

df <- data.frame(ID = rep(1, 10), Season = rep_len(1:4, 10))
age_classes <- c('New_Born', 'Yearling', 'Adult')

Then you can subset the age_classes vector with the cumsum of Season == 3, i.e. subset the vector with an index equal to the number of times season has been equal to 3 for that particular row, to get that row's age_class.

library(data.table)
setDT(df)

df[, age_class := age_classes[cumsum(Season == 3) + 1], 
   by = ID]

df
#     ID Season age_class
#  1:  1      1  New_Born
#  2:  1      2  New_Born
#  3:  1      3  Yearling
#  4:  1      4  Yearling
#  5:  1      1  Yearling
#  6:  1      2  Yearling
#  7:  1      3     Adult
#  8:  1      4     Adult
#  9:  1      1     Adult
# 10:  1      2     Adult

Edit:

If each ID has a starting age class you can add the index of that class in the age_classes vector, instead of adding 1, to the cumsum output.

Starting data

df <- data.frame(ID = rep(1, 10), Season = rep_len(1:4, 10), 
                 orig_age_class = c('New_Born', rep(NA, 9)))
age_classes <- c('New_Born', 'Yearling', 'Adult')



#    ID Season orig_age_class
# 1   1      1       New_Born
# 2   1      2           <NA>
# 3   1      3           <NA>
# 4   1      4           <NA>
# 5   1      1           <NA>
# 6   1      2           <NA>
# 7   1      3           <NA>
# 8   1      4           <NA>
# 9   1      1           <NA>
# 10  1      2           <NA>

Code and output

library(data.table)
setDT(df)

df[, age_class := {
        start_ind <- match(orig_age_class[1], age_classes)
        n3 <- cumsum(Season == 3)
        age_classes[pmin(length(age_classes), n3 + start_ind)]}, 
   by = ID]

df
#     ID Season orig_age_class age_class
#  1:  1      1       New_Born  New_Born
#  2:  1      2           <NA>  New_Born
#  3:  1      3           <NA>  Yearling
#  4:  1      4           <NA>  Yearling
#  5:  1      1           <NA>  Yearling
#  6:  1      2           <NA>  Yearling
#  7:  1      3           <NA>     Adult
#  8:  1      4           <NA>     Adult
#  9:  1      1           <NA>     Adult
# 10:  1      2           <NA>     Adult

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76460

A base R solution could be the following.

ageclass <- c('New_Born', 'Yearling', 'Adult')

sp <- split(df1, df1$ID)
result <- lapply(sp, function(DF){
  f <- cumsum(DF[['Season']] == 3) + 1
  i <- which(ageclass %in% DF[[3]])
  if(i > 1) f <- f + 1
  f[f > 3] <- 3
  DF[['New']] <- ageclass[f]
  DF
})

result <- do.call(rbind, result)
row.names(result) <- NULL
result

Note that I have tested with Original Age Class equal to "Yearling" and it worked.

Data.

x <-"
+----+--------+--------------------+----------------+
  | ID | Season | `Original Age Class` | `Desired Output` |
  +----+--------+--------------------+----------------+
  |  1 |      1 | New_Born           | New_Born       |
  |  1 |      2 |                    | New_Born       |
  |  1 |      3 |                    | Yearling       |
  |  1 |      4 |                    | Yearling       |
  |  1 |      1 |                    | Yearling       |
  |  1 |      2 |                    | Yearling       |
  |  1 |      3 |                    | Adult          |
  |  1 |      4 |                    | Adult          |
  |  1 |      1 |                    | Adult          |
  |  1 |      2 |                    | Adult          |
  +----+--------+--------------------+----------------+"


df1 <- data.table::fread(gsub('\\+.+\\n' ,'', x, perl = T), drop=c(1,6))

Upvotes: 0

Related Questions