Mr.Spock
Mr.Spock

Reputation: 521

R - Merging dataframe by row and column

I want to kind of transform a dataframe from long into wide format what one little extra task. I want to also merge it into another dataframe.

So this is my original df and I can transform it into a wide format.

  V1   V2 V3
1   A  IIA  1
2   A S1MU  5
3   A  AMU  3
4   C  IIU  6
5   C  IMA  7
6   A S2FU  8
7   B S1MU  9
8   A  IIU 13
9   B S1II  7
10  C S1II  9
11  D  IMA  4
12  A S1II  3

reshape(testframe, idvar = "V1", timevar = "V2", direction = "wide")

   V1 V3.IIA V3.S1MU V3.AMU V3.IIU V3.IMA V3.S2FU V3.S1II
1   A      1       5      3     13     NA       8       3
4   C     NA      NA     NA      6      7      NA       9
7   B     NA       9     NA     NA     NA      NA       7
11  D     NA      NA     NA     NA      4      NA      NA

And now I want to add that transformed dataframe into this dataframe:

  Code IIA S1MU AMU IIU IMA S2FU S1II IFA BIA
1    A  NA   NA  NA  NA  NA   NA   NA  NA  NA
2    B  NA   NA  NA  NA  NA   NA   NA  NA  NA
3    C  NA   NA  NA  NA  NA   NA   NA  NA  NA
4    D  NA   NA  NA  NA  NA   NA   NA  NA  NA

It has more columns than needed for this example, in reality I have much more and the last 2 columns will also be filled. So I have to match the right code and the right column name.

Any ideas how I could do that?

testframe:

structure(list(V1 = c("A", "A", "A", "C", "C", "A", "B", "A", 
"B", "C", "D", "A"), V2 = c("IIA", "S1MU", "AMU", "IIU", "IMA", 
"S2FU", "S1MU", "IIU", "S1II", "S1II", "IMA", "S1II"), V3 = c(1, 
5, 3, 6, 7, 8, 9, 13, 7, 9, 4, 3)), row.names = c(NA, -12L), class = "data.frame")

newframe:

structure(list(Code = c("A", "B", "C", "D"), IIA = c(NA, NA, 
NA, NA), S1MU = c(NA, NA, NA, NA), AMU = c(NA, NA, NA, NA), IIU = c(NA, 
NA, NA, NA), IMA = c(NA, NA, NA, NA), S2FU = c(NA, NA, NA, NA
), S1II = c(NA, NA, NA, NA), IFA = c(NA, NA, NA, NA), BIA = c(NA, 
NA, NA, NA)), row.names = c(NA, -4L), class = "data.frame")

Upvotes: 1

Views: 55

Answers (2)

akrun
akrun

Reputation: 887691

We can use dcast from either reshape2 or data.table. If we need the other missing columns from 'newframe', just assign those column names not found in the output but only in 'newframe' to NA and then change the column order to as in 'newframe'

library(data.table)
out <- dcast(setDT(df1), V1 ~ V2, value.var = 'V3')
setnames(out, 'V1', 'Code')
out[, setdiff(names(newframe), names(out)) := NA_real_]
setcolorder(out, names(newframe))

-output

> out
   Code IIA S1MU AMU IIU IMA S2FU S1II IFA BIA
1:    A   1    5   3  13  NA    8    3  NA  NA
2:    B  NA    9  NA  NA  NA   NA    7  NA  NA
3:    C  NA   NA  NA   6   7   NA    9  NA  NA
4:    D  NA   NA  NA  NA   4   NA   NA  NA  NA

Or if we want to use base R based on indexing, create a row names column and use rownames/column names for subsetting

out <- xtabs(V3 ~ V1 + V2, df1)
out[out == 0] <- NA
tmp <- as.matrix(`row.names<-`(newframe[-1], newframe$Code))
tmp[row.names(out), colnames(out)] <- out

-output

> tmp
  IIA S1MU AMU IIU IMA S2FU S1II IFA BIA
A   1    5   3  13  NA    8    3  NA  NA
B  NA    9  NA  NA  NA   NA    7  NA  NA
C  NA   NA  NA   6   7   NA    9  NA  NA
D  NA   NA  NA  NA   4   NA   NA  NA  NA

Upvotes: 2

Onyambu
Onyambu

Reputation: 79318

You could use tidyverse:

library(tidyverse)
testframe %>%
  rename(Code = V1) %>%
  pivot_wider(Code, V2, values_from = V3) %>%
  coalesce(newframe)

  # A tibble: 4 x 10
  Code    IIA  S1MU   AMU   IIU   IMA  S2FU  S1II IFA   BIA  
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
1 A         1     5     3    13    NA     8     3 NA    NA   
2 C        NA    NA    NA     6     7    NA     9 NA    NA   
3 B        NA     9    NA    NA    NA    NA     7 NA    NA   
4 D        NA    NA    NA    NA     4    NA    NA NA    NA  

Upvotes: 3

Related Questions