nopainnogain
nopainnogain

Reputation: 123

Categorizing Data frame with R

I have a following sample code to make one data frame containing information for more than 1 ID. I want to sort them by defined categories. In which I want to see the percentage change at specific (given time for e.h here t=10) with respect to its baseline value and return the value of that found category in output. I have explained detailed step of my calculation below.

a=c(100,105,126,130,150,100,90,76,51,40)
t=c(0,5,10,20,30)
t=rep(t,2)
ID=c(1,1,1,1,1,2,2,2,2,2)
data=data.frame(ID,t,a)   

My desired Calculation

 1)for all ID  at t=0 "a" value is baseline
 2) Computation
    e.g At Given t=10 (Have to provide) take corresponding a value
   %Change(answer) = (taken a value - baseline/baseline)
 3) Compare the answer in the following define CATEGORIES..
   #category
   1-If answer>0.25
   2-If -0.30<answer<0.25
   3-If -1.0<answer< -0.30
   4-If answer== -1.0
 4)Return the value of category

Desired Output

 ID My_Answer
 1    1
 2    3

Can anybody help me in this.I do understand the flow of my computation but not awre of efficient way of doing it as i have so many ID in that data frame. Thank you

Upvotes: 0

Views: 1966

Answers (1)

AndrewMacDonald
AndrewMacDonald

Reputation: 2950

It's easier to do math with columns than with rows. So the first step is to move baseline numbers into their own columns, then use cut to define these groups:

library(dplyr)
library(tidyr)

foo <- data %>%
  filter(t == 0) %>%
  left_join(data %>% 
              filter(t != 0),
             by = "ID") %>%
  mutate(percentchange = (a.y - a.x) / a.x,
         My_Answer = cut(percentchange, breaks = c(-1, -0.3, 0.25, Inf),
                         right = FALSE, include.lowest = TRUE, labels = c("g3","g2","g1")),
         My_Answer = as.character(My_Answer),
         My_Answer = ifelse(percentchange == -1, "g4", My_Answer)) %>%
  select(ID, t = t.y, My_Answer)

foo 
  ID t.x a.x t.y a.y percentchange My_Answer
1  1   0 100   5 105          0.05        g2
2  1   0 100  10 126          0.26        g1
3  1   0 100  20 130          0.30        g1
4  1   0 100  30 150          0.50        g1
5  2   0 100   5  90         -0.10        g2
6  2   0 100  10  76         -0.24        g2
7  2   0 100  20  51         -0.49        g3
8  2   0 100  30  40         -0.60        g3

You can see that this lets us calculate My_Answer for all values at once. if you want to find out the values for t == 10, you can just pull out those rows:

foo %>%
  filter(t == 10)

  ID  t My_Answer
1  1 10        g1
2  2 10        g2

Upvotes: 1

Related Questions