Jay
Jay

Reputation: 1201

R - matching one value by combining values in a column

I have a data frame with two columns as below:

    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA

I wanted to find matching rows in column 'Col2' corresponding to the single value in 'Col1'. For example, 7197.36 = 14.00 + 5173.94 + 2009.42 (rows 1,2,13 in 'Col2')

Here, sum of 'Col1' = sum of 'Col2'

The final data frame should look like this:

   Col1    Col2
 1   7197.36    14.00
 2        NA  5173.94
 3        NA  2009.42
 4   7004.38   473.32
 5        NA  6531.06
 6  26355.52   110.05
 7        NA  1307.32
 8        NA 13333.06
 9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  4980.61
14       NA   502.89
15       NA  3182.86
16       NA       NA

Can anybody help me?

Upvotes: 7

Views: 151

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269451

We solve it via integer linear programming solving the problem of finding the minimum objective value greater than or equal to the target and if it is found to within numerical precision then return it; otherwise, return NULL.

library(lpSolve)

obj <- na.omit(DF$Col2)
targets <- na.omit(DF$Col1)
L <- lapply(targets, function(value) {
    iobj <- 100 * obj
    ivalue <- 100 * value
    res <- lp("min", iobj, t(iobj), ">=", ivalue, all.bin = TRUE)
    ok <- isTRUE(all.equal(ivalue, res$objval))
    if (ok) obj[res$solution == 1]
})
names(L) <- targets

giving:

> L

$`7197.36`
[1]   14.00 5173.94 2009.42

$`7004.38`
[1]  473.32 6531.06

$`26355.52`
[1] 13333.06   110.05  1307.32  3777.65  7827.44

$`8753.22`
[1] 4980.61   85.00    1.86  502.89 3182.86

Note 1: Later the question was modified to request this form of output:

transform(stack(L), Col1 = ifelse(duplicated(ind), NA, as.numeric(paste(ind))), 
                    Col2 = values)[3:4]

Note 2: We used this as DF

Lines <- "    Col1       Col2
1   7197.36    14.00
2        NA  5173.94
3        NA 13333.06
4   7004.38   473.32
5        NA  4980.61
6  26355.52   110.05
7        NA  1307.32
8        NA  6531.06
9        NA  3777.65
10       NA  7827.44
11  8753.22    85.00
12       NA     1.86
13       NA  2009.42
14       NA   502.89
15       NA  3182.86
16       NA       NA"

DF <- read.table(text = Lines, header = TRUE)

Upvotes: 7

Veerendra Gadekar
Veerendra Gadekar

Reputation: 4472

Here is a way using combinations from gtools (Won't be very efficient for huge data sets)

library(gtools)
library(zoo)
library(splitstackshape)

data$Col1_mod = na.locf(data$Col1)

df = stack(
     lapply(split(data, f = data$Col1_mod), 
     function(x){ 
         tmp1 = data.frame(
                combinations(
                    length(data$Col2[!is.na(data$Col2)]),
                    length(x$Col2[!is.na(x$Col2)]),
                    data$Col2[!is.na(data$Col2)]));
         tmp1$rowsums = rowSums(tmp1); 
         tmp2 = tmp1[tmp1$rowsums == unique(x$Col1_mod),];   
         toString(tmp2[,!colnames(tmp2) %in% 'rowsums'])
     }))

this will give

#> df
#                                       values      ind
#1                             473.32, 6531.06  7004.38
#2                        14, 2009.42, 5173.94  7197.36
#3          1.86, 85, 502.89, 3182.86, 4980.61  8753.22
#4 110.05, 1307.32, 3777.65, 7827.44, 13333.06 26355.52

you can reshape it using cSplit from splitstackshape

out = cSplit(setDT(df), 'values', ',', 'long')

 #>out
 #     values      ind
 #1:   473.32  7004.38
 #2:  6531.06  7004.38
 #3:    14.00  7197.36
 #4:  2009.42  7197.36
 #5:  5173.94  7197.36
 #6:     1.86  8753.22
 #7:    85.00  8753.22
 #8:   502.89  8753.22
 #9:  3182.86  8753.22
#10:  4980.61  8753.22
#11:   110.05 26355.52
#12:  1307.32 26355.52
#13:  3777.65 26355.52
#14:  7827.44 26355.52
#15: 13333.06 26355.52

Upvotes: 2

Related Questions