Reputation: 235
I would like to replace the value in a data frame cell based on the column name matching a row variable, as in the example code. I know enough to know that the nested loop is not the preferred way to do this (and that a more efficient solution is probably quite simple), but haven't been able to figure out a clean vectorized approach. I've been messing with nested apply, i.e., an apply inside a function called by apply, but haven't been successful, nor does that seem like it would be the best way, either.
Input:
test.df
Group G1 G2 G3
1 G1 0.63910462 0.5738143 0.97428347
2 G2 0.62578294 0.4653417 0.92010090
3 G3 0.01136336 0.3163722 0.20266664
4 G1 0.09054996 0.1984567 0.17488220
5 G2 0.74865266 0.9862222 0.80725355
6 G3 0.82855980 0.2668529 0.06786335
7 G1 0.74310410 0.8861169 0.44801963
8 G2 0.97329786 0.6682355 0.69658779
9 G3 0.68696020 0.7362139 0.77452962
Output:
test.df
Group G1 G2 G3
1 G1 NA 0.5738143 0.9742835
2 G2 0.62578294 NA 0.9201009
3 G3 0.01136336 0.3163722 NA
4 G1 NA 0.1984567 0.1748822
5 G2 0.74865266 NA 0.8072535
6 G3 0.82855980 0.2668529 NA
7 G1 NA 0.8861169 0.4480196
8 G2 0.97329786 NA 0.6965878
9 G3 0.68696020 0.7362139 NA
Code:
test.df <- data.frame("Group"=rep(c("G1", "G2", "G3"), 3), "G1"=runif(9, 0, 1), "G2"=runif(9, 0, 1), "G3" = runif(9,0,1))
for (j in 1:ncol(test.df)) {
for(i in 1:nrow(test.df)) {
if(colnames(test.df)[j] == test.df$Group[i]) {
test.df[i,j] <- NA
}
}
}
I think I could use dplyr filter to get the values that need to be replaced, replace them, and then reassemble the data frame, but I'm interested in learning other options.
Upvotes: 2
Views: 2784
Reputation: 14958
Passing the Baton:
test.df %>% mutate(G1 = ifelse(Group == "G1", NA, G1))
# Group G1 G2 G3
# 1 G1 NA 0.3337749 0.3999944
# 2 G2 0.25801678 0.4763512 0.3253522
# 3 G3 0.47854525 0.8921983 0.7570871
# 4 G1 NA 0.8643395 0.2026923
# 5 G2 0.08424691 0.3899895 0.7111212
# 6 G3 0.87532133 0.7773207 0.1216919
# 7 G1 NA 0.9606180 0.2454885
# 8 G2 0.83944035 0.4346595 0.1433044
# 9 G3 0.34668349 0.7125147 0.2396294
Now, how do I get the mutate to sweep through all the columns?
Upvotes: 1
Reputation: 66819
You could split rows and then sub-assign:
s = split(seq_len(nrow(test.df)), test.df$Group)
for (k in names(s)) test.df[s[[k]], k] <- NA_real_
Or with data.table:
library(data.table)
for (k in names(s)) set(test.df, i = s[[k]], j = k, v = NA_real_)
Note that you don't actually have to use a data.table here; we're using set
on a data.frame. The advantage of set
in this case is its modification by reference.
Upvotes: 2
Reputation: 99331
Vectorize it with match()
.
idx <- cbind(seq_len(nrow(test.df)), match(test.df$Group, names(test.df)[-1]))
test.df[-1][idx] <- NA
idx
creates the index matrix to use for gathering the values we want to convert to NA. The first part is just a sequence the length of the number of rows. The second part matches the Group
column with the names of the other columns. Then we replace.
This results in the updated test.df
Group G1 G2 G3
1 G1 NA 0.5738143 0.9742835
2 G2 0.62578294 NA 0.9201009
3 G3 0.01136336 0.3163722 NA
4 G1 NA 0.1984567 0.1748822
5 G2 0.74865266 NA 0.8072535
6 G3 0.82855980 0.2668529 NA
7 G1 NA 0.8861169 0.4480196
8 G2 0.97329786 NA 0.6965878
9 G3 0.68696020 0.7362139 NA
Note: I had stringsAsFactors = FALSE
in my creation of test.df
, so in yours you will have to use test.df$Group <- as.character(test.df$Group)
first since your example has it as factors.
Upvotes: 3