How to find combinations that sum up to a certain value in each row of a dataframe

I've been asked in the job for a function for R that uses parallelization and, when you pass it a dataframe as the one added below, it should look for combinations that added up equals to the value in Importe_Pendiente for each row. The found combinations should be placed in a new column called "Combinations" and its value will be the IDs of the rows that, when summed, result in the same Importe_Pendiente value for that row. The value of Importe_Pendiente could be either a positive or negative. I'm pretty new in this world so I hope you can help me! Thank you very much.

The target dataframe is like this:

datos <- data.frame(
  ID = c("FCR23E015-1625", "E015-23-3583", "E015-23-3584", "E015-23-3585", "FCR23NIEB-0141"),
  Proveedor = c("2192", "6772", "6772", "6772", "7403"),
  Descripcion = c("Factura FCR23E015-1625", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "Factura FCR23NIEB-0141"),
  Importe = c(-2330, 54.8, 54.8, 66, -1029),
  Importe_Pendiente = c(-2330, 45, 55, 100, -1029)
)

And the output should be something like this:

 ID             Proveedor Descripcion                  Importe Importe_Pendiente Combinaciones     
  <chr>          <chr>     <chr>                          <dbl>             <dbl> <chr>             
1 FCR23E015-1625 2192      Factura FCR23E015-1625       -2330             -2330 NO_COMBINACIONES   
2 E015-23-3583   6772      AMAZON BUSINESS EU, S.A.R.L.    54.8              45 NO_COMBINACIONES      
3 E015-23-3584   6772      AMAZON BUSINESS EU, S.A.R.L.    54.8              55 NO_COMBINACIONES      
4 E015-23-3585   6772      AMAZON BUSINESS EU, S.A.R.L.    66               100 [E015-23-3583+E015-23-3584]
5 FCR23NIEB-0141 7403      Factura FCR23NIEB-0141       -1028             -1028 NO_COMBINACIONES

In case that some rows have more than 1 combination possible it would be nice that the combinations are split with brackets [].

I've spent a lot of time trying to modify this script which I took from chat-gpt but I wasn't lucky enough to make it work.

plan(multisession)

buscar_combinaciones <- function(df) {
  # Casting the column Importe_Pendiente to numeric type
  df$Importe_Pendiente <- as.numeric(df$Importe_Pendiente)
  
  resultado <- df %>%
    group_by(Proveedor) %>%
    mutate(Combinaciones = {
      if (n() < 2) {
        "NO_COMBINACIONES"
      } else {
        idx_comb <- combn(n(), 2)
        combinations_str <- vector("character", ncol(idx_comb))
        
        for (i in seq_along(combinations_str)) {
          combination <- Importe_Pendiente[idx_comb[, i]]
          if (sum(combination) == 0) {
            combinations_str[i] <- paste(ID[idx_comb[, i]], collapse = "+")
          }
        }
        
        valid_combinations <- na.omit(combinations_str)
        
        if (length(valid_combinations) == 0) {
          "NO_COMBINACIONES"
        } else {
          paste(valid_combinations, collapse = ", ")
        }
      }
    }) %>%
    ungroup()
  
  return(resultado)
}

# Adjusting the number of cores of the CPU to be used while parallelizing
# Only multiple cores will be used when there are more than 1 row for each Proveedor
num_nucleos <- ifelse(length(unique(datos$Proveedor)) > 1, 4, 1)
plan(multisession, workers = num_nucleos)

Upvotes: 1

Views: 66

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102609

Probably you can try the code below

datos %>%
    mutate(Combinationes = {
        if (n() < 2) {
            "NO_COMBINACIONES"
        } else {
            s <- unlist(
                lapply(
                    2:n(),
                    \(k) {
                        combn(1:n(),
                            k,
                            FUN = \(...) setNames(
                                sum(Importe_Pendiente[...]),
                                sprintf("[%s]", paste0(ID[...], collapse = "+"))
                            ),
                            simplify = FALSE
                        )
                    }
                )
            )
            v <- tapply(names(s), s, toString)
            replace_na(
                v[match(as.character(Importe_Pendiente), names(v))],
                "NO_COMBINACIONES"
            )
        }
    }, .by = Proveedor)

which gives

              ID Proveedor                  Descripcion Importe
1 FCR23E015-1625      2192       Factura FCR23E015-1625 -2330.0
2   E015-23-3583      6772 AMAZON BUSINESS EU, S.A.R.L.    54.8
3   E015-23-3584      6772 AMAZON BUSINESS EU, S.A.R.L.    54.8
4   E015-23-3585      6772 AMAZON BUSINESS EU, S.A.R.L.    66.0
5 FCR23NIEB-0141      7403       Factura FCR23NIEB-0141 -1029.0
  Importe_Pendiente               Combinationes
1             -2330            NO_COMBINACIONES
2                45            NO_COMBINACIONES
3                55            NO_COMBINACIONES
4               100 [E015-23-3583+E015-23-3584]
5             -1029            NO_COMBINACIONES

Upvotes: 0

Nir Graham
Nir Graham

Reputation: 5167

Here is a simple non parallel implementation; This is unlikely to scale well

# https://stackoverflow.com/questions/76787219/how-to-find-combinations-that-sum-up-to-a-certain-value-in-each-row-of-a-datafra
C
datos <- data.frame(
  ID = c("FCR23E015-1625", "E015-23-3583", "E015-23-3584", "E015-23-3585", "FCR23NIEB-0141"),
  Proveedor = c("2192", "6772", "6772", "6772", "7403"),
  Descripcion = c("Factura FCR23E015-1625", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "Factura FCR23NIEB-0141"),
  Importe = c(-2330, 54.8, 54.8, 66, -1029),
  Importe_Pendiente = c(-2330, 45, 55, 100, -1029)
)

(simpler_start <- datos |> select(ID, Importe_Pendiente))

(next_step <- rowwise(simpler_start) |>
  mutate(
    others_raw = list({
      \(x){
        simpler_start |>
          filter(ID != x) |>
          deframe()
      }
    }(ID)),
    len = length(unlist(others_raw)),
    combinations = list(flatten(map(seq_len(len), \(x){
      combn(others_raw, m = x, simplify = FALSE)
    }))),
    test_combs = list(map(combinations, \(x){
      sum(x) == Importe_Pendiente
    })),
    combs_solve = ({
      temp <- map2_chr(combinations, test_combs, \(x, y){
        if (y) {
          paste0(names(x), collapse = ",")
        } else {
          ""
        }
      })
      paste0(temp[nzchar(temp)], collapse = ";")
    })
  ))

Upvotes: 0

Related Questions