RainieJin
RainieJin

Reputation: 51

Combine values from different rows into one row only when there is only one value in each column in R

I have a dataframe called "rawfile" as below.

structure(list(mz = c(123, 243, 277, 456, 886.555, 886.556, 887.565, 
887.567, 887.568), condition.1 = c(0.001, 0.003, 2e-04, NA, 0.000511135, 
NA, NA, 0.000162055, NA), condition.2 = c(NA, NA, 3e-04, 0.007, 
NA, 0.000661594, 1e-04, 3e-04, NA), condition.3 = c(NA, NA, 4e-04, 
NA, NA, 0.000502741, NA, NA, 2e-04), group = c(1, 2, 3, 4, 5, 
5, 6, 6, 6)), row.names = c(NA, -9L), class = "data.frame")

I want to move all the values to the top row within each group if there is more than one row present in one group. I used the following code and it works if there is only one value within each column in one group.

# Function to move all values to the top row based on group
move_values <- function(df) {
  if (nrow(df) > 1) {
    for (col in grep("^condition", names(df), value = TRUE)) {
      df[1, col] <- ifelse(any(!is.na(df[, col])), df[!is.na(df[, col]), col], NA)
      df[-1, col] <- NA  # Fill other rows with NA for the current column
    }
  }
  return(df)
}

# Group the data by 'group' and apply the move_values function
result <- rawfile %>%
  group_by(group) %>%
  do(move_values(.)) %>%
  ungroup()

However, for this dataframe, there is more than one value in the same column (condition 2) in group 6, and I want to keep this group unchanged if this happens.

Does anyone have any suggestions on how to filter out groups with more than one values in the same column? And move the values to the top row for other groups?

The expected dataframe I would like to have in the end is below:

structure(list(mz = c(123, 243, 277, 456, 886.555, 886.556, 887.565, 
887.567, 887.568), condition.1 = c(0.001, 0.003, 2e-04, NA, 0.000511135, 
NA, NA, 0.000162055, NA), condition.2 = c(NA, NA, 3e-04, 0.007, 
0.000661594, NA, 1e-04, 3e-04, NA), condition.3 = c(NA, NA, 4e-04, 
NA, 0.000502741, NA, NA, NA, 2e-04), group = c(1L, 2L, 3L, 4L, 
5L, 5L, 6L, 6L, 6L)), class = "data.frame", row.names = c(NA, 
-9L))

Upvotes: 2

Views: 37

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 51994

You can create the condition with by summing the complete.cases across the selected columns, and then use this condition in an ifelse statement, and sort complete and incomplete cases.

library(dplyr)
df %>% 
  mutate(cond = rowSums(across(condition.1:condition.3, \(x) sum(complete.cases(x))) > 1),
         across(condition.1:condition.3, \(x) ifelse(cond == 0, c(x[!is.na(x)], x[is.na(x)]), x)),
         .by = group)

#        mz condition.1 condition.2 condition.3 group cond
# 1 123.000 0.001000000          NA          NA     1    0
# 2 243.000 0.003000000          NA          NA     2    0
# 3 277.000 0.000200000 0.000300000 0.000400000     3    0
# 4 456.000          NA 0.007000000          NA     4    0
# 5 886.555 0.000511135 0.000661594 0.000502741     5    0
# 6 886.556          NA          NA          NA     5    0
# 7 887.565          NA 0.000100000          NA     6    1
# 8 887.567 0.000162055 0.000300000          NA     6    1
# 9 887.568          NA          NA 0.000200000     6    1

Upvotes: 0

Related Questions