user1655130
user1655130

Reputation: 429

Re arrange rows in a dataframe - R

UPDATED:

I have a dataframe as follows:

x <- data.frame("A" = c(15, 13.5, 12, 9.1), 
                "B" = c(13.6, 8.4, 6.7, NA), 
                "C" = c(8.5, 2.43, 1.23, NA),
                "D" = c(8.6, 6.43, 4.23, NA),
                "E" = c(7.5, 4.43, 3.23, NA))


And I need to arrange to something like this:

xR <- data.frame("A" = c(15, 13.5, 12, 9.1, NA, NA, NA, NA, NA), 
                 "B" = c(NA, 13.6, 8.4, 6.7, NA, NA, NA, NA, NA), 
                 "C" = c(NA, NA, 8.5, 2.43, 1.23, NA, NA, NA, NA),
                 "D" = c(NA, NA, 8.6, 6.43, 1.23, NA, NA, NA, NA),
                 "E" = c(NA, NA, NA, 7.5, 4.43, 3.23, NA, NA, NA))

The logic is as follows: I need to place the first value on col B where it is equal to or greater than the first value it meets in Col A and so on for Col B and Col C. I need to try automate it as my dataframe is huge. Thanks for your help.

Upvotes: 0

Views: 91

Answers (2)

ekoam
ekoam

Reputation: 8844

Perhaps try these two functions

align_proper <- function(df, at = c("first", "last"), offset = 0L) {
  df <- as.matrix(df)
  dr <- nrow(df)
  dc <- ncol(df)
  m <- cbind(t(df), NA_real_)
  pos <- is.na(m); m[pos] <- Inf
  pos <- cumsum(c(0L, pmin(
    max.col(m[-1L, 1L] >= m[-dc, ], match.arg(at)), 
    max.col(pos[-dc, ], "first")
  ) - 1L) + offset)
  pos <- pos - min(pos)
  outlen <- max(pos) + dr
  pos <- pos + 1L
  proto <- matrix(
    nrow = outlen, ncol = dc, 
    dimnames = list(NULL, dimnames(df)[[2L]])
  )
  as.data.frame(Reduce(function(out, i) {
    out[seq.int(pos[[i]], by = 1L, length.out = dr), i] <- df[, i]
    out
  }, seq_along(pos), proto))
}

align_top <- function(df) {
  collen <- max(colSums(!is.na(df)))
  as.data.frame(vapply(df, function(x) {
    x[!is.na(x)][seq_len(collen)]
  }, numeric(collen)))
}

We first use align_top to remove extra leading or trailing NAs (if any) for each column. You can skip this step if your actual data looks exactly like what you showed in your question, but we need to ensure a standardised input.

     A    B    C                  A    B    C
1 15.0   NA   NA             1 15.0 13.6 8.50
2 13.5 13.6   NA             2 13.5  8.4 2.43
3 12.0  8.4 8.50    ====>    3 12.0  6.7 1.23
4  9.1  6.7 2.43             4  9.1   NA   NA
5   NA   NA 1.23             
6   NA   NA   NA             
7   NA   NA   NA             
8   NA   NA   NA        

align_proper then gives you the desired output

> align_proper(align_top(xR))
     A    B    C
1 15.0   NA   NA
2 13.5 13.6   NA
3 12.0  8.4 8.50
4  9.1  6.7 2.43
5   NA   NA 1.23
6   NA   NA   NA

> align_proper(align_top(x))
     A    B    C
1 15.0   NA   NA
2 13.5 13.6   NA
3 12.0  8.4 8.50
4  9.1  6.7 2.43
5   NA   NA 1.23
6   NA   NA   NA

> align_proper(data.frame("A" = c(1.23, 2.43, 6.5, NA), 
+                         "B" = c(6.7, 8.4, 8.9, NA), 
+                         "C" = c(9.1, 12, 13.5, 15)), "last")
     A   B    C
1 1.23  NA   NA
2 2.43  NA   NA
3 6.50 6.7   NA
4   NA 8.4   NA
5   NA 8.9  9.1
6   NA  NA 12.0
7   NA  NA 13.5
8   NA  NA 15.0

Update

I add one more argument offset in the function. Now you can do this

> align_proper(data.frame("A" = c(1.23, 2.43, 6.5, NA), 
+                         "B" = c(6.7, 8.4, 8.9, NA), 
+                         "C" = c(9.1, 12, 13.5, 15)), "last", -1L)
     A   B    C
1 1.23  NA   NA
2 2.43 6.7   NA
3 6.50 8.4  9.1
4   NA 8.9 12.0
5   NA  NA 13.5
6   NA  NA 15.0

> align_proper(data.frame("A" = c(1.23, 2.43, 6.5, NA), 
+                         "B" = c(6.7, 8.4, 8.9, NA), 
+                         "C" = c(9.1, 12, 13.5, 15)), "last", 3L)
      A   B    C
1  1.23  NA   NA
2  2.43  NA   NA
3  6.50  NA   NA
4    NA  NA   NA
5    NA  NA   NA
6    NA 6.7   NA
7    NA 8.4   NA
8    NA 8.9   NA
9    NA  NA   NA
10   NA  NA   NA
11   NA  NA  9.1
12   NA  NA 12.0
13   NA  NA 13.5
14   NA  NA 15.0

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 174586

I don't know of an way to easily automate this, but you can do it in base R with the following algorithm:

x <- data.frame("A" = c(1.23, 2.43, 6.5, NA), 
                "B" = c(6.7, 8.4, 8.9, NA), 
                "C" = c(9.1, 12, 13.5, 15))


x <- as.list(x)

for(i in 2:length(x)) {
  x[[i]] <- c(rep(NA, which.max(x[[i-1]][x[[i-1]] < x[[i]][1]])), x[[i]])
}

x <- lapply(x, function(y) c(y, rep(NA, max(sapply(x, length)) - length(y))))

as.data.frame(x)
#>       A   B    C
#> 1  1.23  NA   NA
#> 2  2.43  NA   NA
#> 3  6.50  NA   NA
#> 4    NA 6.7   NA
#> 5    NA 8.4   NA
#> 6    NA 8.9   NA
#> 7    NA  NA  9.1
#> 8    NA  NA 12.0
#> 9    NA  NA 13.5
#> 10   NA  NA 15.0

Upvotes: 1

Related Questions