littleworth
littleworth

Reputation: 5169

How to complete missing values in column and row of a dataframe based on vector as source

I have the following data frame:

library(dplyr)
library(tibble)

df <- tibble(
  source = c("a", "b", "b"),
  target = c("a", "b", "c"),
  score = c(10, 5, 3) ) %>%
  pivot_wider(names_from = target, values_from = score) %>%
  as.data.frame() %>%
  column_to_rownames(var = "source")


df

It looks like this

  a  b  c
a 10 NA NA
b NA  5  3

Now the column and row names are incomplete. The complete names is given in this vector:

master_source_target <- c("a", "b","c","d")

I want to complete in the column and row of the data frame based on that vector so that the end it will look like this:

   a  b  c  d
a 10 NA NA  NA
b NA  5  3  NA
c NA  NA NA NA
d NA  NA NA NA

How can I achieve that?

Upvotes: 1

Views: 269

Answers (3)

akrun
akrun

Reputation: 886938

It is easier with a matrix of NAs and then do the assignment based on the dimnames of 'df'

new <- matrix(NA, length(master_source_target), 
    length(master_source_target), 
       dimnames = list(master_source_target, master_source_target))
new[row.names(df), colnames(df)] <- unlist(df)
new
#   a  b  c  d
#a 10 NA NA NA
#b NA  5  3 NA
#c NA NA NA NA
#d NA NA NA NA

As the OP used tibble, we can also make use of tidyverse approachess

library(tibble)
library(dplyr)
library(tidyr)
df %>% 
   rownames_to_column('rn') %>%
   pivot_longer(cols = -rn, values_drop_na = TRUE) %>%
   complete(rn = master_source_target, name = master_source_target) %>%
   pivot_wider(names_from = name, values_from = value) %>% 
   column_to_rownames('rn')

Or with crossing and a join

crossing(Var1 = master_source_target, Var2 = master_source_target) %>% 
   left_join(as.data.frame.table(as.matrix(df))) %>% 
   pivot_wider(names_from = Var2, values_from = Freq) %>%
   column_to_rownames('Var1')

Upvotes: 1

lroha
lroha

Reputation: 34291

You can do it in a single step:

df[setdiff(master_source_target, rownames(df)), master_source_target] <- NA

df
   a  b  c  d
a 10 NA NA NA
b NA  5  3 NA
c NA NA NA NA
d NA NA NA NA

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

You could add missing column names first and then rbind missing rows with NA.

df[setdiff(master_source_target, names(df))] <- NA
rbind(df,matrix(NA, nrow = length(master_source_target) - nrow(df), ncol = ncol(df),
        dimnames = list(setdiff(master_source_target, rownames(df)), names(df))))

#   a  b  c  d
#a 10 NA NA NA
#b NA  5  3 NA
#c NA NA NA NA
#d NA NA NA NA

Upvotes: 1

Related Questions