Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

How to concatenate several columns into single vector in R using apply() family

Suppose I have a dataframe like this, with w1representing words and d1, d2, etc. representing durations in discourse:

set.seed(12)
df <- data.frame(
  w1 = c(sample(LETTERS[1:4], 10, replace = T)),
  d1 = c(rep(NA, 3), round(rnorm(7),3)),
  d2 = c(round(rnorm(6),3), NA, round(rnorm(3),3)),
  d3 = c(round(rnorm(2),3), rep(NA,2), round(rnorm(6),3)),
  d4 = c(round(rnorm(1),3), NA, round(rnorm(8),3))
)
df
   w1     d1     d2    d3     d4
1   D     NA -0.043 0.314 -2.149
2   C     NA -0.113 0.407     NA
3   A     NA  0.457    NA  0.971
4   D -1.596  2.020    NA  1.145
5   C -0.309 -1.051 0.994 -0.525
6   D  0.449  0.735 0.856  0.250
7   A -0.977     NA 0.197 -0.429
8   A  0.190  0.539 0.834 -0.183
9   C  0.731 -1.314 0.847 -0.103
10  B -0.493 -0.250 1.954 -0.634

As d1, d2, etc. are in fact one and the same variable I'd like to concatenate them into a single vector. It can easily be done thus:

d <- c(df$d1, df$d2, df$d3, df$d4)
d
[1]     NA     NA     NA -1.596 -0.309  0.449 -0.977  0.190  0.731 -0.493 -0.043 -0.113  0.457  2.020
[15] -1.051  0.735     NA  0.539 -1.314 -0.250  0.314  0.407     NA     NA  0.994  0.856  0.197  0.834
[29]  0.847  1.954 -2.149     NA  0.971  1.145 -0.525  0.250 -0.429 -0.183 -0.103 -0.634

BUT: my real dataframe has many many such duration columns and concatenating them in this way is tedious. So I tried using the apply family of functions. But the results are not what I want:

lapply(df[,2:5], c)
$d1
[1]     NA     NA     NA -1.596 -0.309  0.449 -0.977  0.190  0.731 -0.493
$d2
[1] -0.043 -0.113  0.457  2.020 -1.051  0.735     NA  0.539 -1.314 -0.250
$d3
[1] 0.314 0.407    NA    NA 0.994 0.856 0.197 0.834 0.847 1.954
$d4
[1] -2.149     NA  0.971  1.145 -0.525  0.250 -0.429 -0.183 -0.103 -0.634

sapply(df[,2:5], c)
          d1     d2    d3     d4
[1,]     NA -0.043 0.314 -2.149
[2,]     NA -0.113 0.407     NA
[3,]     NA  0.457    NA  0.971
[4,] -1.596  2.020    NA  1.145
[5,] -0.309 -1.051 0.994 -0.525
[6,]  0.449  0.735 0.856  0.250
[7,] -0.977     NA 0.197 -0.429
[8,]  0.190  0.539 0.834 -0.183
[9,]  0.731 -1.314 0.847 -0.103
[10,] -0.493 -0.250 1.954 -0.634

How must the code be changed to get me the desired result, shown in d?

Upvotes: 1

Views: 732

Answers (2)

JaiPizGon
JaiPizGon

Reputation: 486

Try:

do.call("c", df[,2:5])
   d11    d12    d13    d14    d15    d16    d17    d18    d19   d110    d21    d22    d23    d24    d25    d26 
    NA     NA     NA -0.272 -0.315 -0.628 -0.106  0.428 -0.778 -1.294 -0.780  0.012 -0.152 -0.703  1.189  0.341 
   d27    d28    d29   d210    d31    d32    d33    d34    d35    d36    d37    d38    d39   d310    d41    d42 
    NA  0.507 -0.293  0.224  2.007  1.012     NA     NA -0.302 -1.025 -0.267 -0.199  0.131  0.146  0.362     NA 
   d43    d44    d45    d46    d47    d48    d49   d410 
 0.674  2.072 -0.541 -1.070 -0.372 -0.485  0.275 -0.480 

Upvotes: 0

jay.sf
jay.sf

Reputation: 72663

Maybe just unlist() will do.

as.numeric(unlist(df[2:5]))
# [1]     NA     NA     NA -0.272 -0.315 -0.628 -0.106  0.428 -0.778 -1.294 -0.780  0.012
# [13] -0.152 -0.703  1.189  0.341     NA  0.507 -0.293  0.224  2.007  1.012     NA     NA
# [25] -0.302 -1.025 -0.267 -0.199  0.131  0.146  0.362     NA  0.674  2.072 -0.541 -1.070
# [37] -0.372 -0.485  0.275 -0.480

Upvotes: 2

Related Questions