Reputation: 59
In R, I would like to create two new variables (var3 and var4) based on conditions that I apply on existing variables (var1 and var2), which has duplicate records. Here is how my data looks like.
Var1 var2
01 A
01 B
01 A
02 C
02 C
03 D
04 E
04 D
04 F
. .
. .
. .
. .
. .
I would use following if-else-then statement in SAS.
if var1 = 01 and var2 = "A" then do; var3 = "New York"; var4= "Buffalo"; end; else;
if var1 = 01 and var2 = "B" then do; var3 = "New York"; var4= "Cornell"; end; else;
if var1 = 02 and var2 = "C" then do; var3 = "North Carolina"; var4= "Raleigh"; end; else;
if var1 = 03 and var = "D"then do; var3 = "Texas"; var4= "Dallas"; end; else;
My output will look like this
Var1 var2 var3 var4
01 A New York Buffalo
01 B New York Cornell
01 A New York Buffalo
02 C North Carolina Raleigh
02 C North Carolina Raleigh
03 D Texas Dallas
. . . .
. . . .
. . . .
. . . .
Any help to create above output in R is great appreciated. Do I need to use if-else and for statement, ifelse, etc??
Upvotes: 1
Views: 5674
Reputation: 887881
You could create an index dataset ('df2') and merge that with the original dataset ('df1')
merge(df1, df2)
# var1 var2 var3 var4
#1 01 A New York Buffalo
#2 01 A New York Buffalo
#3 01 B New York Cornell
#4 02 C North Carolina Raleigh
#5 02 C North Carolina Raleigh
#6 03 D Texas Dallas
df1 <- structure(list(var1 = c("01", "01", "01", "02", "02", "03"),
var2 = c("A", "B", "A", "C", "C", "D")), .Names = c("var1",
"var2"), row.names = c(NA, -6L), class = "data.frame")
df2 <- data.frame(var1=c('01', '01', '02', '03'), var2=LETTERS[1:4],
var3=c('New York', 'New York', 'North Carolina', 'Texas'),
var4=c('Buffalo', 'Cornell', 'Raleigh', 'Dallas'))
Upvotes: 0
Reputation: 3488
df$var3<-ifelse(Var1==01, "New York",
ifelse(Var1==02, "North Carolina",
ifelse(Var1==03, "Texas", NA)))
df$var4<-....
Or by apply labels:
df$var3<-factor(df$Var1,
levels = 1:3,
labels = c("New York","North Carolina","Texas"))
Upvotes: 3