Emma Sylvester
Emma Sylvester

Reputation: 149

Assign ID across 2 columns of variable

I have a data frame in which each individual (row) has two data points per variable.

Example data:

df1 <- read.table(text = "IID L1.1 L1.2 L2.1 L2.2 
                        1  1  38V1 38V1 48V1 52V1
                        2  2  36V1 38V2 50V1 48Y1
                        3  3  37Y1 36V1 50V2 48V1
                        4  4  38V2 36V2 52V1 50V2",
  stringsAsFactor = FALSE, header = TRUE)

I have many more columns than this in the full dataset and would like to recode these values to label unique identifiers across the two columns. I know how to get identifiers and relabel a single column from previous questions (Creating a unique ID and How to assign a unique ID number to each group of identical values in a column) but I don't know how to include the information for two columns, as R identifies and labels factors per column.

Ultimately I want something that would look like this for the above data:

(df2)
  IID L1.1 L1.2 L2.1 L2.2 
1  1   1    1    1    4
2  2   2    4    2    5
3  3   3    2    3    1
4  4   1    5    4    3

It doesn't really matter what the numbers are, as long as they indicate unique values across both columns. I've tried creating a function based on the output from:

unique(df1[,1:2]) 

but am struggling as this still looks at unique entries per column, not across the two.

Upvotes: 1

Views: 103

Answers (2)

Mike H.
Mike H.

Reputation: 14360

You could reshape it to long format, assign the groups and then recast it to wide:

library(data.table)

df_m <- melt(df, id.vars = "IID")
setDT(df_m)[, id := .GRP, by = .(gsub("(.*).","\\1", df_m$variable), value)]
dcast(df_m, IID ~ variable, value.var = "id")

#  IID L1.1 L1.2 L2.1 L2.2
#1   1    1    1    6    9
#2   2    2    4    7   10
#3   3    3    2    8    6
#4   4    1    5    9    8

This should also be easily expandable to multiple groups of columns. I.e. if you have L3. it should work with that as well.

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18425

Something like this would work...

pairs <- (ncol(df1)-1)/2
for(i in 1:pairs){
  refs <- unique(c(df1[,2*i],df1[,2*i+1]))
  df1[,2*i] <- match(df1[,2*i],refs)
  df1[,2*i+1] <- match(df1[,2*i+1],refs)
}

df1
  IID L1.1 L1.2 L2.1 L2.2
1   1    1    1    1    4
2   2    2    4    2    5
3   3    3    2    3    1
4   4    4    5    4    3

Upvotes: 2

Related Questions