Mary B.
Mary B.

Reputation: 125

Recode variables within a group in R

For a research project, I need to change the coding of sample ID's (variable: Samp_ID) that stem from different articles (variable: Art_ID). My data is in long format. In the original data set, the sample ID's were coded as an ascending number across all coded articles. If an articles use the same sample multiple times, the sample ID is the same in multiple rows. However, if different samples were used, the sample ID differs within the same "group" of Art_ID. The data set looks like this:

df_original <- read.table(text=
"Art_ID   Samp_ID         
1         1                
2         2           
2         2          
2         2         
3         3 
4         4 
4         5          
5         6      
6         7
7         8  
7         8
7         8  
7         9   
7         9   
7         9
8         10", header=TRUE)

However, I would like to have the sample ID coded with an ascending number within each article. Thus, if only one sample has been used, each row for this article should be coded as 1. If two different samples have been used within one article, the rows using the first sample should be coded as 1 and the rows using the second sample should be coded as 2 (as in Art_ID == 4 in df_new). Finally, I aim to create a variable that is the combination of Art_ID and Samp_ID.

I would like the new data set to look like this:

df_new <- read.table(text=
"Art_ID   Samp_ID   Art_Samp_ID      
1         1         1_1       
2         1         2_1  
2         1         2_1 
2         1         2_1
3         1         3_1
4         1         4_1
4         2         4_2        
5         1         5_1     
6         1         6_1
7         1         7_1 
7         1         7_1
7         1         7_1
7         2         7_2  
7         2         7_2
7         2         7_2
8         1         8_1", header=TRUE)

To create the variable Art_Samp_ID, I would use this code:

df_new$Art_Samp_ID <- as.factor(paste(df_new$Art_ID, df_new$Samp_ID, sep = "_"))

Does anyone know, how to do the recoding of Samp_ID most efficiently (e.g., by using tidyverse)? I am happy for any advice!

Upvotes: 1

Views: 599

Answers (2)

Martin Gal
Martin Gal

Reputation: 16998

A slightly different method using dplyr:

library(dplyr)

df_original %>% 
  group_by(Art_ID) %>% 
  mutate(Samp_ID = 1 + cumsum(Samp_ID != lag(Samp_ID, default = first(Samp_ID))),
         Art_Samp_ID = paste(Art_ID, Samp_ID, sep = "_")) %>% 
  ungroup()

returns

# A tibble: 16 x 3
   Art_ID Samp_ID Art_Samp_ID
    <int>   <dbl> <chr>      
 1      1       1 1_1        
 2      2       1 2_1        
 3      2       1 2_1        
 4      2       1 2_1        
 5      3       1 3_1        
 6      4       1 4_1        
 7      4       2 4_2        
 8      5       1 5_1        
 9      6       1 6_1        
10      7       1 7_1        
11      7       1 7_1        
12      7       1 7_1        
13      7       2 7_2        
14      7       2 7_2        
15      7       2 7_2        
16      8       1 8_1  

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389325

You can use dense_rank for Samp_ID and use unite to create Art_Samp_ID.

library(dplyr)
library(tidyr)

df_original %>%
  group_by(Art_ID) %>%
  mutate(Samp_ID = dense_rank(Samp_ID)) %>%
         #Few other options to get Samp_ID would be 
         #Samp_ID = match(Samp_ID, unique(Samp_ID)), 
         #Samp_ID = as.integer(factor(Samp_ID)))
  ungroup() %>%
  unite(Art_Samp_ID, Art_ID, Samp_ID, remove = FALSE)

#  Art_Samp_ID Art_ID Samp_ID
#   <chr>        <int>   <int>
# 1 1_1              1       1
# 2 2_1              2       1
# 3 2_1              2       1
# 4 2_1              2       1
# 5 3_1              3       1
# 6 4_1              4       1
# 7 4_2              4       2
# 8 5_1              5       1
# 9 6_1              6       1
#10 7_1              7       1
#11 7_1              7       1
#12 7_1              7       1
#13 7_2              7       2
#14 7_2              7       2
#15 7_2              7       2
#16 8_1              8       1

Upvotes: 2

Related Questions