Ben A
Ben A

Reputation: 209

Normalization with apply

I'm working on a project for a colleague to normalize GC data and convert from mol% to mass%.

Edit: I'm doing row-wise normalization. i.e. at each time the sum of the species in norm1 should be 100 (though each is multiplied by mass and so no longer sums to 100. In a for loop it would be equivalent to a very burdensome:

for (time in Nmass[,1]){
   for species in norm1{
      Nmass[time,species] = Fmolwt[species,] = Nmass[time,species] / rowSums(Nmass[time,norm1])
                       }
                       }

I have the CSV files imported and they are arranged as columns of species names and rows of injection times (working on dummy data so all zeros currently).

> Nmass[1:5,c("Time",norm1)]
# A tibble: 5 x 13
  Time                HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene `HTFeed_iso-butane` `HTFee~ `HTFeed~ `HTFe~ HTFee~ `HTFee~ `HTFee~
  <dttm>                       <dbl>         <dbl>           <dbl>          <dbl>            <dbl>               <dbl>   <dbl>    <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
1 2019-10-06 13:02:00              0             0               0              0                0                   0       0        0      0      0       0       0
2 2019-10-06 13:17:00              0             0               0              0                0                   0       0        0      0      0       0       0
3 2019-10-06 13:32:00              0             0               0              0                0                   0       0        0      0      0       0       0
4 2019-10-06 13:47:00              0             0               0              0                0                   0       0        0      0      0       0       0
5 2019-10-06 14:02:00              0             0               0              0                0                   0       0        0      0      0       0       0

I have a working normalization routine:

norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane','HTFeed_n-Butane',
        'HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x/sum(x)))

But when I attempt to implement the mass conversion using a prebuilt list of masses by species:

Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
                    'HTFeed_n-Butane','HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')

The routine becomes (I think):

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))

I get an error about sizes being different.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 0, 3696
In addition: Warning messages:
1: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
2: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
3: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
4: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
5: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
6: In x * Fmolwt[x, ] :
  longer object length is not a multiple of shorter object length
7: In x * Fmolwt[x, ] :

I expect this is due to the apply statement attempting pull in the molecular weights of everything named in norm1 at the same time.

Can I do this work the way I'm trying or do I need to write out a for loop?

Upvotes: 0

Views: 138

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

You have a bug here:

Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))

With apply(..,2,..), you are calling out the column entries with x, and from what I gather, you need to do row-wise operations. Secondly, Fmolwt[x,] gives an error because you are calling out values (not colnames) that match the rownames of Fmolwt.

I simulate some data that looks like yours below, for illustration:

set.seed(1234)

norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')

values <- matrix(abs(rnorm(120,1000,100)),ncol=12)
colnames(values) = norm1

ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
    as.POSIXct("2017-01-02", tz = "UTC"),
    length.out = 100)

Nmass = data.frame(Time=ts,values,check.names=F)

Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,
56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene',
'HTFeed_iso-butane','HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')

How the simulated data looks like:

> head(Nmass,2)
                 Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1 2017-01-01 00:00:00       879.2934      952.2807       1013.4088
2 2017-01-01 00:14:32      1027.7429      900.1614        950.9314
  HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane HTFeed_n-Butane
1      1110.2298        1144.9496          819.3969        1065.659
2       952.4407         893.1357          941.7924        1254.899
  HTFeed_trans-2-butene HTFeed_1-Butene HTFeed_Isobutylene HTFeed_cis-2-butene
1             1000.6893        982.2210           994.6841           1041.4524
2              954.4531        983.0006          1025.5196            952.5282
  HTFeed_1,3-Butadiene
1             980.4065
2             935.0930

First step, we take first row as example, to normalize it (by its total) and then multiply by the corresponding mass, for example row 1, do:

Fmolwt[norm1,]*Nmass[1,norm1]/sum(Nmass[1,norm1])

Gives you the following results:

  HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene
1       1.176825      2.389309        2.371873       4.159423         4.020092
  HTFeed_iso-butane HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
1          3.973688        5.167942              4.685041        4.598576
  HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
1           4.656926            4.875886             4.425653

If you want to use the in-built r function, the easiest is apply, which you have used:

results = t(apply(Nmass[,norm1],1,function(x){
      Fmolwt[norm1,]*x/sum(x)
    }))

So following what we have before, x is a row from Nmass[,norm1], so we do x/sum(x) to normalize, then multiply by Fmolwt[norm1,]. The values match because we started with Nmass[,norm1]. Now we need to transpose the results to get the same dimensions as Nmass, hence the t(apply(..)).

If we look at the first row, it gives the same output as the example above:

> results[1,]
       HTFeed_Methane         HTFeed_Ethane       HTFeed_Ethylene 
             1.176825              2.389309              2.371873 
       HTFeed_Propane      HTFeed_Propylene     HTFeed_iso-butane 
             4.159423              4.020092              3.973688 
      HTFeed_n-Butane HTFeed_trans-2-butene       HTFeed_1-Butene 
             5.167942              4.685041              4.598576 
   HTFeed_Isobutylene   HTFeed_cis-2-butene  HTFeed_1,3-Butadiene 
             4.656926              4.875886              4.425653

So if you want to put the results back, do

Nmass[,norm] = results

Upvotes: 1

Related Questions