Ankita Chowdhury
Ankita Chowdhury

Reputation: 43

Manipulate dataframe value to match different columns

I have a dataframe as below. If cluster is NA, then I want to look for the "id" column in "To" column and populate the New_Col with the matching row's cluster value.

id  cluster from    to
A   NA      NA      NA
B   2       B       D
C   5       C       A
D   NA      NA      NA
E   5       E       B
F   NA      NA      NA
G   3       G       F

Expected output

id  cluster from    to  New_Col
A   NA      NA      NA  5
B   2       B       D   2
C   5       C       A   5
D   NA      NA      NA  2
E   5       E       B   5
F   NA      NA      NA  3
G   3       G       F   3

Upvotes: 0

Views: 67

Answers (3)

sachin2014
sachin2014

Reputation: 462

Here is another option using for loop and which

use which to find index where column 'to' matches column 'id' when column 'cluster' is NA

for (i in 1:length(df$cluster)){
      df$new_col[i] =  ifelse(is.na(df$cluster[i])==T,df$cluster[which(df$to==df$id[i])],df$cluster[i])
    }

data

df <- data.frame(id= c("A", "B", "C","D", "E", "F", "G"),
                 cluster = c(NA, 2L, 5L,NA, 5L, NA, 3L),
                 from =c(NA, "B", "C",NA, "E", NA, "G"),
                 to = c(NA, "D", "A",NA, "B", NA, "F"), stringsAsFactors = F)

output

df
  id cluster from   to new_col
1  A      NA <NA> <NA>       5
2  B       2    B    D       2
3  C       5    C    A       5
4  D      NA <NA> <NA>       2
5  E       5    E    B       5
6  F      NA <NA> <NA>       3
7  G       3    G    F       3

Upvotes: 0

hello_friend
hello_friend

Reputation: 5788

Using @Ronak Shah's logic (Base R solution):

df$new_col <-  ifelse(is.na(df$cluster), df$cluster[match(df$id, df$to)], df$cluster)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can use match :

#Copy cluster value
df$New_col <- df$cluster
#Get NA indices
inds <- is.na(df$New_col)
#Get corresponding cluster values for NA values.
df$New_col[inds] <- with(df, cluster[match(id[inds], to)])
df

#  id cluster from   to New_col
#1  A      NA <NA> <NA>       5
#2  B       2    B    D       2
#3  C       5    C    A       5
#4  D      NA <NA> <NA>       2
#5  E       5    E    B       5
#6  F      NA <NA> <NA>       3
#7  G       3    G    F       3

data

df <- structure(list(id = structure(1:7, .Label = c("A", "B", "C", 
"D", "E", "F", "G"), class = "factor"), cluster = c(NA, 2L, 5L, 
NA, 5L, NA, 3L), from = structure(c(NA, 1L, 2L, NA, 3L, NA, 4L
), .Label = c("B", "C", "E", "G"), class = "factor"), to = structure(c(NA, 
3L, 1L, NA, 2L, NA, 4L), .Label = c("A", "B", "D", "F"), class = "factor")), 
class = "data.frame", row.names = c(NA, -7L))

Upvotes: 2

Related Questions