Mats
Mats

Reputation: 21

Adding a new variable based on different dates in r

I have a big dataset including the variable "ID", "Date", "AS", "Value", "Condition Day 1", "Condition Day 2", and "measurement".

We measured amino acids in different subjects (ID) on two different dates (Date) with two different conditions (Condition Day 1 and 2). There are two measurements on every date.

ID Date AS Value Condition Day 1 Condition Day 2 measurement
1 2015-07-12 TRP 23 1 2 1
1 2015-07-12 TRP 11 1 2 2
1 2015-07-12 LEU .. 1 2
1 2015-07-12 LEU 1 2
1 2015-07-12 .. 1 2
1 2015-08-23 TRP 2 1
1 2015-08-23 TRP 2 1
1 2015-08-23 LEU 2 1
1 2015-08-23 LEU 2 1
... ... ... 2 1
2 .. .. ... .
2 .. ... ... .. .

Now I am trying to create a new date variable based on the old one. The first date should be assigned the value "1", the second date the value "2". The problem I have, is that the dates change from ID to ID, since every subject was tested on a different dates ranging from 2015 to 2018.

In the next step I want to create a new variable based on condition, and measurement ranging from 1 to 4.

"1" = measurement 1 + Condition 1 
"2" = measurement 2 + Condition 1
"3" = measurement 1 + Condition 2
"4" = measurement 2 + Condition 2

In the end the table should be looking something like this:

ID AS Value D_C
1 TRP 23 1
1 TRP 11 2
1 TRP ... 3
1 TRP ... 4
1 LEU .. 1
1 LEU ... 2
1 LEU ... 3
1 LEU ... 4
2 ... ... ...
2 ... ... ...

I am kind of stuck here, especially with the first question. Can anyone help? Thank you so much!

*** UPDATE - MRE *** here some example data

df = data.frame(ID=c(1, 1, 1, 1, 2, 2, 2, 2), 
Date=c("2015-07-15", "2015-07-15", "2015-07-30", "2015-07-30", "2016-07-15", "2016-07-15", "2016-07-30", "2016-07-30"),
AS = c("TRP", "TRP", "TRP", "TRP", "ILE", "ILE", "ILE", "ILE"), 
Concentration = c(3, 2, 1, 3, 3, 2, 1, 0),
Method_Day1 = c(1, 1, 2, 2, 2, 2, 1, 1),
Method_Day2 = c(2, 2, 1, 1, 1, 1, 2, 2), 
Sample = c(1, 2, 1, 2, 1, 2, 1, 2))

Upvotes: 0

Views: 1031

Answers (1)

pholzm
pholzm

Reputation: 1784

In order to encode the first and second date per ID, we can group by ID, and then compare each date against the groupwise max/min.

library(tidyverse)

df1 <- df %>% 
  group_by(ID) %>% 
  dplyr::mutate(
    date_encoding = case_when(
      Date == min(Date) ~ 1,
      Date == max(Date) ~ 2, 
      TRUE~0))
df1
# A tibble: 8 x 8
# Groups:   ID [2]
     ID Date       AS    Concentration Method_Day1 Method_Day2 Sample date_encoding
  <dbl> <chr>      <chr>         <dbl>       <dbl>       <dbl>  <dbl>       <dbl>
1     1 2015-07-15 TRP               3           1           2      1           1
2     1 2015-07-15 TRP               2           1           2      2           1
3     1 2015-07-30 TRP               1           2           1      1           2
4     1 2015-07-30 TRP               3           2           1      2           2
5     2 2016-07-15 ILE               3           2           1      1           1
6     2 2016-07-15 ILE               2           2           1      2           1
7     2 2016-07-30 ILE               1           1           2      1           2
8     2 2016-07-30 ILE               0           1           2      2           2

For the second part of your question, you can again use mutate and case_when based on the other column conditions, but the description you provide for Condition/Method Day 1/2 is not completely clear from the question.

EDIT: as this works only for the specific case of expecting exactly two Dates per ID, I'll also add the more general solution that will work for an arbitrary number of Dates per ID:

# sort and index Dates per ID
df %>% 
  dplyr::distinct(ID, Date) %>%
  group_by(ID) %>%
  arrange(Date, .by_group = TRUE) %>%
  dplyr::mutate(encoding2 = row_number()) %>%
# then join them back to the original dataframe
  right_join(df, by = c("ID", "Date")) 
# A tibble: 8 x 8
# Groups:   ID [2]
     ID Date       encoding2 AS    Concentration Method_Day1 Method_Day2 Sample
  <dbl> <chr>          <int> <chr>         <dbl>       <dbl>       <dbl>  <dbl>
1     1 2015-07-15         1 TRP               3           1           2      1
2     1 2015-07-15         1 TRP               2           1           2      2
3     1 2015-07-30         2 TRP               1           2           1      1
4     1 2015-07-30         2 TRP               3           2           1      2
5     2 2016-07-15         1 ILE               3           2           1      1
6     2 2016-07-15         1 ILE               2           2           1      2
7     2 2016-07-30         2 ILE               1           1           2      1
8     2 2016-07-30         2 ILE               0           1           2      2

Upvotes: 1

Related Questions