Reputation: 125
I have a data frame structured as follows:
df <- structure(list(NAME1 = c("AAA","CCC","BBB","BBB"),
NAME2 = c("BBB", "AAA","DDD","AAA"),
AMT = c(10, 20, 30, 50)), .Names=c("NAME1","NAME2", "AMT"),
row.names = c("1", "2", "3", "4"), class =("data.frame"))
I would like to create two ID variables (ID1 and ID2) based on two string columns, namely NAME1 and NAME2. The two columns may share values so the IDs have to be consistent. The desired data frame should look like this:
df <- structure(list(NAME1 = c("AAA","CCC", "BBB", "BBB"),
NAME2 = c("BBB", "AAA","DDD", "AAA"),
ID1 = c(1,3,2,2),
ID2 = c(2,1,4,1),
AMT = c(10,20,30,50)),
.Names = c("NAME1","NAME2","ID1","ID2"),
row.names = c("1", "2", "3", "4"), class =("data.frame"))
Your suggestions would be appreciated.
Cheers.
Upvotes: 1
Views: 253
Reputation: 5532
The following answer would work and could be modified to allow IDs to be assigned within more than data frame (e.g. df1, df2); this assumes the variable afact
below is created with all of the desired factor levels. This also uses dplyr
to create the new columns.
library(dplyr)
adf <- structure(list(NAME1 = c("AAA","CCC","BBB","BBB"),
NAME2 = c("BBB", "AAA","DDD","AAA"),
AMT = c(10, 20, 30, 50)), .Names=c("NAME1","NAME2", "AMT"),
row.names = c("1", "2", "3", "4"), class =("data.frame"))
## Create factor based on all unique values.
## include all variables (e.g. NAME1) needed in factor.
afact <- as.factor(unique(sort(c(adf$NAME1, adf$NAME2))))
## Factor level to numeric value.
num.lookup <- function(x) { as.numeric(afact[afact == x])}
# Create the new ID columns using the factor 'afact' and 'num.lookup'
# to assign numeric values consistant across columns.
adf %>%
mutate(ID1 = sapply(NAME1, num.lookup),
ID2 = sapply(NAME2, num.lookup))
# NAME1 NAME2 AMT ID1 ID2
# 1 AAA BBB 10 1 2
# 2 CCC AAA 20 3 1
# 3 BBB DDD 30 2 4
# 4 BBB AAA 50 2 1
Upvotes: 0
Reputation: 3194
You should create a vector of both and turn it into a factor and then into numeric. Then you can properly subset it with the number of rows in the df and place them back:
newIDs <- as.numeric(as.factor(c(df$NAME1, df$NAME2)))
df$ID1 <- newIDs[1:nrow(df)]
df$ID2 <- newIDs[-c(1:nrow(df))]
Upvotes: 1