HNSKD
HNSKD

Reputation: 1644

Categorise data based on the trend

I have a data on individuals with their titre levels. The titre levels are in ordered categories. Instead of using the labelled values, I have used the numeric values - this would allow me to see the trend. My logic as follows:

  1. If the values are monotonically increasing, i.e. either increases or remains the same, assign "uptrend"
  2. If the values are monotonically decreasing, i.e. either decreases or remains the same, assign "downtrend"
  3. If there's no difference, i.e. all values remain the same, assign "stable"
  4. Everything else would be considered "fluctuate"

However, I want to relax the criteria to allow for 1 "deviation" or "difference". Example (picture below), id==9 would be considered "fluctuate" in the original logic, but I want this to be "stable" since there's only a single value that's difference from the rest. Similarly for id %in% 10:12.

enter image description here

This is what I have done previously, using dplyr and tidyr, but I am pretty much stuck.

library(tidyverse)

df %>%
  
  # Create a duplicate column to convert into long format
  mutate(titre_level_sep = titre_level) %>% 
  separate_rows(titre_level_sep, sep = ",") %>% 
  
  # Calculate for each id
  group_by(id) %>% 
  mutate(
    # Trends and diff in titre levels
    diff = as.numeric(titre_level_sep) - lag(as.numeric(titre_level_sep)),
    increasing = all(diff >= 0, na.rm = TRUE),
    decreasing = all(diff <= 0, na.rm = TRUE),
    stable = all(diff == 0, na.rm = TRUE)) %>%
  ungroup() %>% 
  select(everything(), -c(diff, titre_level_sep)) %>% 
  distinct()  %>% 
  mutate(
    # Trends final classification
    trend2 = case_when(increasing & !decreasing & !stable ~ "uptrend",
                         !increasing & decreasing & !stable ~ "downtrend",
                         increasing & decreasing & stable ~ "stable",
                         !increasing & !decreasing & !stable ~ "fluctuate")) %>% 
  select(id, trend, trend2)

Here's the data I have used:

df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 
    titre_level = c("2,1,1,1,1", "4,4,4,4,4,1,1,1,1,1", "8,6,6,6,6,4", 
    "1,1,1,1,1,1,1,1,1,3,3", "4,4,7,10", "8,11", "1,1", "6,6,6,6,6,6,6,6,6", 
    "1,2,1,1,1,1,1,1,1", "8,8,6,8", "4,2,2,2,4,2,2,2,1", "1,1,2,4,4,1"
    ), trend = c("downtrend", "downtrend", "downtrend", "uptrend", 
    "uptrend", "uptrend", "stable", "stable", "stable", "stable", 
    "downtrend", "uptrend")), class = "data.frame", row.names = c(NA, 
12L))

Upvotes: 1

Views: 48

Answers (1)

Mark
Mark

Reputation: 12528

df %>%
  mutate(trend = map_chr(titre_level, \(s) {
    s %>% str_split(",", simplify = TRUE) %>%
    {case_when(
        all(diff(as.numeric(.)) == 0, na.rm = TRUE) ~ "stable",
        all(diff(as.numeric(.)) >= 0, na.rm = TRUE) ~ "uptrend",
        all(diff(as.numeric(.)) <= 0, na.rm = TRUE) ~ "downtrend",
        length(table(.)) == 2 & min(table(.)) == 1 ~ "deviation", # if there are two unique values in the titre level, and one of them appears only once, then it's a "deviation"
        TRUE ~ "flutuate")}
  }))
# A tibble: 12 × 3
      id titre_level           trend    
   <dbl> <chr>                 <chr>    
 1     1 2,1,1,1,1             downtrend
 2     2 4,4,4,4,4,1,1,1,1,1   downtrend
 3     3 8,6,6,6,6,4           downtrend
 4     4 1,1,1,1,1,1,1,1,1,3,3 uptrend  
 5     5 4,4,7,10              uptrend  
 6     6 8,11                  uptrend  
 7     7 1,1                   stable   
 8     8 6,6,6,6,6,6,6,6,6     stable   
 9     9 1,2,1,1,1,1,1,1,1     deviation
10    10 8,8,6,8               deviation
11    11 4,2,2,2,4,2,2,2,1     flutuate 
12    12 1,1,2,4,4,1           flutuate 

Upvotes: 1

Related Questions