Reputation: 45
Had to make an account because this sequence of for loops has been annoying me for quite some time.
I have a data frame in R with 1000 rows and 10 columns, with each value ranging from 1:3. I would like to re-code EVERY entry so that: 1==3, 2==2, 3==1. I understand that there are easier ways to do this, such as sub-setting each column and hard coding the condition, but this isn't always ideal as many of the data sets that I work with have up to 100 columns.
I would like to use a nested loop in order to accomplish this task -- this is what I have thus far:
for(i in 1:nrow(dat_trans)){
for(j in length(dat_trans)){
if(dat_trans[i,j] == 1){
dat_trans[i,j] <- 3
} else if(dat_trans[i,j] == 2){
dat_trans[i,j] <- 2
} else{
dat_trans[i,j] <- 1
}
}
}
So I iterate through the first column, grab every value and change it based on the if/else's condition, I am still learning R so if you have any pointers in my code, feel free to point it out.
edit: code
Upvotes: 4
Views: 8866
Reputation: 561
This type of operation is a swap operation. The ways to swap values without for loops are numerous.
To set up a simple dataframe:
df <- data.frame(
col1 = c(1,2,3),
col2 = c(2,3,1),
col3 = c(3,1,2)
)
Using a dummy value:
df[df==1] <- 4
df[df==3] <- 1
df[df==4] <- 3
Using a temporary variable:
dftemp <- df
df[dftemp==1] <- 3
df[dftemp==3] <- 1
Using multiplication/division and addition/subtraction:
df <- 4 - df
Using Boolean operations:
df <- (df==1) * 3 + (df==2) * 2 + (df==3) * 1
Using a bitwise xor (in case you really have a need for speed):
df[df!=2] <- sapply(df, function(x){bitwXor(2,x)})[df!=2]
If a nested for loop is required the switch function is a good option.
for(i in seq(ncol(df))){
for(j in seq(nrow(df))){
df[j,i] <- switch(df[j,i],3,2,1)
}
}
Text can be used if the values are not as nicely indexed as 1, 2, and 3.
for(i in seq(ncol(df))){
for(j in seq(nrow(df))){
df[j,i] <- switch(as.character(df[j,i]),
"1" = 3,
"2" = 2,
"3" = 1)
}
}
Upvotes: 3
Reputation: 72984
You could use an assignment matrix am
. match()
each value of an attribute of df1
with column 1 of am
but select column 2, then assign it to df1
. In a lapply()
of course.
df1
# V1 V2 V3
# 1 1 2 1
# 2 1 2 1
# 3 1 1 2
# 4 1 3 2
# 5 2 3 2
am <- matrix(c(1, 2, 3, 3, 2, 1), 3)
am
# [,1] [,2]
# [1,] 1 3
# [2,] 2 2
# [3,] 3 1
df1[] <- lapply(df1, function(x) am[match(x, am[,1]), 2])
df1
# V1 V2 V3
# 1 3 2 3
# 2 3 2 3
# 3 3 3 2
# 4 3 1 2
# 5 2 1 2
df1 <- structure(list(V1 = c(1L, 1L, 1L, 1L, 2L), V2 = c(2L, 2L, 1L,
3L, 3L), V3 = c(1L, 1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 0
Reputation: 24079
R is a vectorized language, so you really don't need the inner loop.
Also if you notice that 4-"old value" = "new value", you can eliminate the if
statements.
for(i in 1:ncol(dat_trans)){
dat_trans[,i] <- 4-dat_trans[,i]
}
The outer loop is now iterating across the columns for only 10 iterations as opposed to 1000 for all of rows. This will greatly improve performance.
Upvotes: 4
Reputation: 160447
This sounds like a merge
/join
operation.
set.seed(42)
dat_trans <- as.data.frame(
setNames(lapply(1:3, function(ign) sample(1:3, size=10, replace=TRUE)),
c("V1", "V2", "V3"))
)
dat_trans
# V1 V2 V3
# 1 3 2 3
# 2 3 3 1
# 3 1 3 3
# 4 3 1 3
# 5 2 2 1
# 6 2 3 2
# 7 3 3 2
# 8 1 1 3
# 9 2 2 2
# 10 3 2 3
newvals <- data.frame(old = c(1, 3), new = c(3, 1))
newvals
# old new
# 1 1 3
# 2 3 1
Using dplyr
and tidyr
:
library(dplyr)
library(tidyr) # gather, spread
dat_trans %>%
mutate(rn = row_number()) %>%
gather(k, v, -rn) %>%
left_join(newvals, by = c("v" = "old")) %>%
mutate(v = if_else(is.na(new), v, new)) %>%
select(-new) %>%
spread(k, v) %>%
select(-rn)
# V1 V2 V3
# 1 1 2 1
# 2 1 1 3
# 3 3 1 1
# 4 1 3 1
# 5 2 2 3
# 6 2 1 2
# 7 1 1 2
# 8 3 3 1
# 9 2 2 2
# 10 1 2 1
(The need for rn
is likely due to my use of an older version of tidyr
: I'm at 0.8.2, though 1.0.0 has recently been released. That release did a lot of enhancement/work on spread
/gather
and introduced the pivot_*
functions which are likely much smoother at this. If you have a more recent version, try this without the rn
portions.)
Or a much-more-direct approach using a "recode" mindset:
dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], car::recode, "1=3; 3=1")
# or
dat_trans[,c("V1", "V2", "V3")] <- lapply(dat_trans[,c("V1", "V2", "V3")], dplyr::recode, '1' = 3L, '3' = 1L)
Upvotes: 0