Reputation: 209
I'm working on a project for a colleague to normalize GC data and convert from mol% to mass%.
Edit: I'm doing row-wise normalization. i.e. at each time the sum of the species in norm1
should be 100 (though each is multiplied by mass and so no longer sums to 100. In a for loop it would be equivalent to a very burdensome:
for (time in Nmass[,1]){
for species in norm1{
Nmass[time,species] = Fmolwt[species,] = Nmass[time,species] / rowSums(Nmass[time,norm1])
}
}
I have the CSV files imported and they are arranged as columns of species names and rows of injection times (working on dummy data so all zeros currently).
> Nmass[1:5,c("Time",norm1)]
# A tibble: 5 x 13
Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene `HTFeed_iso-butane` `HTFee~ `HTFeed~ `HTFe~ HTFee~ `HTFee~ `HTFee~
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-10-06 13:02:00 0 0 0 0 0 0 0 0 0 0 0 0
2 2019-10-06 13:17:00 0 0 0 0 0 0 0 0 0 0 0 0
3 2019-10-06 13:32:00 0 0 0 0 0 0 0 0 0 0 0 0
4 2019-10-06 13:47:00 0 0 0 0 0 0 0 0 0 0 0 0
5 2019-10-06 14:02:00 0 0 0 0 0 0 0 0 0 0 0 0
I have a working normalization routine:
norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane','HTFeed_n-Butane',
'HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x/sum(x)))
But when I attempt to implement the mass conversion using a prebuilt list of masses by species:
Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene','HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene','HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene','HTFeed_1,3-Butadiene')
The routine becomes (I think):
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))
I get an error about sizes being different.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 0, 3696
In addition: Warning messages:
1: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
2: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
3: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
4: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
5: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
6: In x * Fmolwt[x, ] :
longer object length is not a multiple of shorter object length
7: In x * Fmolwt[x, ] :
I expect this is due to the apply statement attempting pull in the molecular weights of everything named in norm1
at the same time.
Can I do this work the way I'm trying or do I need to write out a for loop?
Upvotes: 0
Views: 138
Reputation: 46898
You have a bug here:
Nmass[,norm1] = as.data.frame(apply(Nmass[,norm1], 2, function(x) x*Fmolwt[x,]/sum(x)))
With apply(..,2,..), you are calling out the column entries with x, and from what I gather, you need to do row-wise operations. Secondly, Fmolwt[x,] gives an error because you are calling out values (not colnames) that match the rownames of Fmolwt.
I simulate some data that looks like yours below, for illustration:
set.seed(1234)
norm1 = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene','HTFeed_iso-butane',
'HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')
values <- matrix(abs(rnorm(120,1000,100)),ncol=12)
colnames(values) = norm1
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
length.out = 100)
Nmass = data.frame(Time=ts,values,check.names=F)
Fmolwt = data.frame(c(16.04,30.07,28.05,44.9,42.08,58.12,58.12,
56.11,56.11,56.11,56.11,54.1))
colnames(Fmolwt)[1] = 'weight'
rownames(Fmolwt) = c('HTFeed_Methane','HTFeed_Ethane','HTFeed_Ethylene',
'HTFeed_Propane','HTFeed_Propylene',
'HTFeed_iso-butane','HTFeed_n-Butane','HTFeed_trans-2-butene',
'HTFeed_1-Butene','HTFeed_Isobutylene','HTFeed_cis-2-butene',
'HTFeed_1,3-Butadiene')
How the simulated data looks like:
> head(Nmass,2)
Time HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1 2017-01-01 00:00:00 879.2934 952.2807 1013.4088
2 2017-01-01 00:14:32 1027.7429 900.1614 950.9314
HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane HTFeed_n-Butane
1 1110.2298 1144.9496 819.3969 1065.659
2 952.4407 893.1357 941.7924 1254.899
HTFeed_trans-2-butene HTFeed_1-Butene HTFeed_Isobutylene HTFeed_cis-2-butene
1 1000.6893 982.2210 994.6841 1041.4524
2 954.4531 983.0006 1025.5196 952.5282
HTFeed_1,3-Butadiene
1 980.4065
2 935.0930
First step, we take first row as example, to normalize it (by its total) and then multiply by the corresponding mass, for example row 1, do:
Fmolwt[norm1,]*Nmass[1,norm1]/sum(Nmass[1,norm1])
Gives you the following results:
HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene HTFeed_Propane HTFeed_Propylene
1 1.176825 2.389309 2.371873 4.159423 4.020092
HTFeed_iso-butane HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
1 3.973688 5.167942 4.685041 4.598576
HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
1 4.656926 4.875886 4.425653
If you want to use the in-built r function, the easiest is apply, which you have used:
results = t(apply(Nmass[,norm1],1,function(x){
Fmolwt[norm1,]*x/sum(x)
}))
So following what we have before, x is a row from Nmass[,norm1], so we do x/sum(x) to normalize, then multiply by Fmolwt[norm1,]. The values match because we started with Nmass[,norm1]. Now we need to transpose the results to get the same dimensions as Nmass, hence the t(apply(..)).
If we look at the first row, it gives the same output as the example above:
> results[1,]
HTFeed_Methane HTFeed_Ethane HTFeed_Ethylene
1.176825 2.389309 2.371873
HTFeed_Propane HTFeed_Propylene HTFeed_iso-butane
4.159423 4.020092 3.973688
HTFeed_n-Butane HTFeed_trans-2-butene HTFeed_1-Butene
5.167942 4.685041 4.598576
HTFeed_Isobutylene HTFeed_cis-2-butene HTFeed_1,3-Butadiene
4.656926 4.875886 4.425653
So if you want to put the results back, do
Nmass[,norm] = results
Upvotes: 1