CJR
CJR

Reputation: 3572

R: Calculate linear regression and get slope for "a subset of data"

My goal is to find out half life (from terminal phase if anyone is familiar with Pharmacokinetics)

I have some data containing the following;

1500 rows, with ID being main "key". There is 15 rows per ID. Then I have other columns TIME and CONCENTRATION. Now What I want to do is, for each ID, remove the first TIME (which equals "000" (numeric)), then run lm() function on the remaining 14 rows per ID, and then use abs() to extract the absolute value of the slope, then then save this to a new column named THALF. (If anyone is familiar with Pharmacokinetics maybe there is better way to do this?)

But I have not be able to do this using my limited knowledge of R.

Here is what I've come up with so far:

data_new <- data %>% dplyr::group_by(data $ID) %>% dplyr::filter(data $TIME != 10) %>% dplyr::mutate(THAFL = abs(lm$coefficients[2](data $CONC ~ data $TIME)))

From what I've understood from other Stackoverflow answers, lm$coefficients[2] will extract the slope.

But however, I have not been able to make this work. I get this error from trying to run the code:

Error: Problem with `mutate()` input `..1`.
x Input `..1` can't be recycled to size 15.
i Input `..1` is `data$ID`.
i Input `..1` must be size 15 or 1, not 1500.
i The error occurred in group 1: data$ID = "pat1".

Any suggestions on how to solve this? IF you need more info, let me know please.

(Also, if anyone is familiar with Pharmacokinetics, when they ask for half life from terminal phase, do I do lm() from the concentration max ? I Have a column with value for the highest observed concentration at what time. )

Upvotes: 1

Views: 1605

Answers (2)

Ric S
Ric S

Reputation: 9247

If after the model fitting you still need the observations with TIME == 10, you can try summarising after you group by ID and then using a right join

data %>% 
  filter(TIME != 10) %>% 
  group_by(ID) %>%
  summarise(THAFL = abs(lm(CONC ~ TIME)$coefficients[2])) %>% 
  right_join(data, by = "ID")


# A tibble: 30 x 16
   ID      THAFL Sex   Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C  TIME  CONC LBM   `data_combine$ID`  CMAX
   <chr>   <dbl> <chr>       <int>       <int>     <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr>             <dbl>
 1 pat1  0.00975 F              50         135        47     0     2     1     2     0    10  0    Under pat1                 60
 2 pat1  0.00975 F              50         135        47     0     2     1     2     0    20  6.93 Under pat1                 60
 3 pat1  0.00975 F              50         135        47     0     2     1     2     0    30 12.2  Under pat1                 60
 4 pat1  0.00975 F              50         135        47     0     2     1     2     0    45 14.8  Under pat1                 60
 5 pat1  0.00975 F              50         135        47     0     2     1     2     0    60 15.0  Under pat1                 60
 6 pat1  0.00975 F              50         135        47     0     2     1     2     0    90 12.4  Under pat1                 60
 7 pat1  0.00975 F              50         135        47     0     2     1     2     0   120  9.00 Under pat1                 60
 8 pat1  0.00975 F              50         135        47     0     2     1     2     0   150  6.22 Under pat1                 60
 9 pat1  0.00975 F              50         135        47     0     2     1     2     0   180  4.18 Under pat1                 60
10 pat1  0.00975 F              50         135        47     0     2     1     2     0   240  1.82 Under pat1                 60
# ... with 20 more rows

If after the model fitting you don't want the rows with TIME == 10 to appear on your dataset, you can use mutate

data %>% 
  filter(TIME != 10) %>% 
  group_by(ID) %>%
  mutate(THAFL = abs(lm(CONC ~ TIME)$coefficients[2]))

# A tibble: 28 x 16
# Groups:   ID [2]
   ID    Sex   Weight..kg. Height..cm. Age..yrs. T134A A443G G769C G955C A990C  TIME  CONC LBM   `data_combine$ID`  CMAX   THAFL
   <chr> <chr>       <int>       <int>     <int> <int> <int> <int> <int> <int> <dbl> <dbl> <chr> <chr>             <dbl>   <dbl>
 1 pat1  F              50         135        47     0     2     1     2     0    20  6.93 Under pat1                 60 0.00975
 2 pat2  M              75         175        29     0     2     0     0     0    20  6.78 Under pat2                 60 0.00835
 3 pat1  F              50         135        47     0     2     1     2     0    30 12.2  Under pat1                 60 0.00975
 4 pat2  M              75         175        29     0     2     0     0     0    30 11.6  Above pat2                 60 0.00835
 5 pat1  F              50         135        47     0     2     1     2     0    45 14.8  Under pat1                 60 0.00975
 6 pat2  M              75         175        29     0     2     0     0     0    45 13.5  Under pat2                 60 0.00835
 7 pat1  F              50         135        47     0     2     1     2     0    60 15.0  Under pat1                 60 0.00975
 8 pat2  M              75         175        29     0     2     0     0     0    60 13.1  Above pat2                 60 0.00835
 9 pat1  F              50         135        47     0     2     1     2     0    90 12.4  Under pat1                 60 0.00975
10 pat2  M              75         175        29     0     2     0     0     0    90  9.77 Under pat2                 60 0.00835
# ... with 18 more rows

Upvotes: 1

Duck
Duck

Reputation: 39595

You can use broom:

library(broom)
library(dplyr)
#Code
data %>% group_by(ID) %>%
  filter(TIME!=10) %>%
  do(fit = tidy(lm(CONC ~ TIME, data = .))) %>% 
  unnest(fit) %>%
  filter(term=='TIME') %>%
  mutate(estimate=abs(estimate))

Output:

# A tibble: 2 x 6
  ID    term  estimate std.error statistic p.value
  <chr> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 pat1  TIME   0.00975   0.00334     -2.92  0.0128
2 pat2  TIME   0.00835   0.00313     -2.67  0.0204

If joining with original data is needed, try:

#Code 2
data <- data %>% left_join(data %>% group_by(ID) %>%
  filter(TIME!=10) %>%
  do(fit = tidy(lm(CONC ~ TIME, data = .))) %>% 
  unnest(fit) %>%
  filter(term=='TIME') %>%
  mutate(estimate=abs(estimate)) %>%
  select(c(ID,estimate)))

Similar to @RicS.

Upvotes: 1

Related Questions