Reputation: 235

Replace cell values in data frame based on column name and variable match

I would like to replace the value in a data frame cell based on the column name matching a row variable, as in the example code. I know enough to know that the nested loop is not the preferred way to do this (and that a more efficient solution is probably quite simple), but haven't been able to figure out a clean vectorized approach. I've been messing with nested apply, i.e., an apply inside a function called by apply, but haven't been successful, nor does that seem like it would be the best way, either.

Input:

test.df
  Group         G1        G2         G3
1    G1 0.63910462 0.5738143 0.97428347
2    G2 0.62578294 0.4653417 0.92010090
3    G3 0.01136336 0.3163722 0.20266664
4    G1 0.09054996 0.1984567 0.17488220
5    G2 0.74865266 0.9862222 0.80725355
6    G3 0.82855980 0.2668529 0.06786335
7    G1 0.74310410 0.8861169 0.44801963
8    G2 0.97329786 0.6682355 0.69658779
9    G3 0.68696020 0.7362139 0.77452962

Output:

test.df
  Group         G1        G2        G3
1    G1         NA 0.5738143 0.9742835
2    G2 0.62578294        NA 0.9201009
3    G3 0.01136336 0.3163722        NA
4    G1         NA 0.1984567 0.1748822
5    G2 0.74865266        NA 0.8072535 
6    G3 0.82855980 0.2668529        NA
7    G1         NA 0.8861169 0.4480196
8    G2 0.97329786        NA 0.6965878
9    G3 0.68696020 0.7362139        NA

Code:

test.df <- data.frame("Group"=rep(c("G1", "G2", "G3"), 3), "G1"=runif(9, 0, 1), "G2"=runif(9, 0, 1), "G3" = runif(9,0,1))

for (j in 1:ncol(test.df)) {
  for(i in 1:nrow(test.df)) {
    if(colnames(test.df)[j] == test.df$Group[i]) {
      test.df[i,j] <- NA
    }
  }
}

I think I could use dplyr filter to get the values that need to be replaced, replace them, and then reassemble the data frame, but I'm interested in learning other options.

Upvotes: 2

Answers (3)

leerssej

Reputation: 14958

Passing the Baton:

test.df %>% mutate(G1 = ifelse(Group == "G1", NA, G1))

#   Group         G1        G2        G3
# 1    G1         NA 0.3337749 0.3999944
# 2    G2 0.25801678 0.4763512 0.3253522
# 3    G3 0.47854525 0.8921983 0.7570871
# 4    G1         NA 0.8643395 0.2026923
# 5    G2 0.08424691 0.3899895 0.7111212
# 6    G3 0.87532133 0.7773207 0.1216919
# 7    G1         NA 0.9606180 0.2454885
# 8    G2 0.83944035 0.4346595 0.1433044
# 9    G3 0.34668349 0.7125147 0.2396294

Now, how do I get the mutate to sweep through all the columns?

Upvotes: 1

Frank

Reputation: 66819

You could split rows and then sub-assign:

s = split(seq_len(nrow(test.df)), test.df$Group)
for (k in names(s)) test.df[s[[k]], k] <- NA_real_

Or with data.table:

library(data.table)
for (k in names(s)) set(test.df, i = s[[k]], j = k, v = NA_real_)

Note that you don't actually have to use a data.table here; we're using set on a data.frame. The advantage of set in this case is its modification by reference.

Upvotes: 2

Rich Scriven

Reputation: 99331

Vectorize it with match().

idx <- cbind(seq_len(nrow(test.df)), match(test.df$Group, names(test.df)[-1]))
test.df[-1][idx] <- NA

idx creates the index matrix to use for gathering the values we want to convert to NA. The first part is just a sequence the length of the number of rows. The second part matches the Group column with the names of the other columns. Then we replace.

This results in the updated test.df

Group         G1        G2        G3
1    G1         NA 0.5738143 0.9742835
2    G2 0.62578294        NA 0.9201009
3    G3 0.01136336 0.3163722        NA
4    G1         NA 0.1984567 0.1748822
5    G2 0.74865266        NA 0.8072535
6    G3 0.82855980 0.2668529        NA
7    G1         NA 0.8861169 0.4480196
8    G2 0.97329786        NA 0.6965878
9    G3 0.68696020 0.7362139        NA

Note: I had stringsAsFactors = FALSE in my creation of test.df, so in yours you will have to use test.df$Group <- as.character(test.df$Group) first since your example has it as factors.

Upvotes: 3

Replace cell values in data frame based on column name and variable match

Answers (3)

Related Questions