Reputation: 9
I am currently working with a data set of 760 metabolites. These metabolites were provided to 15 bacterial species. Further, their growth was monitored at 2 optical densities (OD), in triplicate. Therefore, I have a data set with 1520 rows and 17 columns.
optins formdlim='.';
data = Vera`
input Metabolite$ OD P1 P2 P3 P4 P5 P6 P7 C8 C9 C10 B11 B12 B13 B14 B15
;
cards;
proc print; run;
What I want to do is to find out whether the data of the 2 optical densities is the same (no significant difference between the data at 492 and 630 for each metabolite. Therefore, I wrote the following code:
PROC mixed DATA=Vera;
CLASS OD ;
MODEL P1 = OD / SOLUTION ;
lsmeans OD/ diff;
RUN;
With this you can analyse the differences between the ODs at 492 and 630 for each of the bacterial species. However, you would have to separate the data and run each metabolite separately. That would not be a problem with a small number of variables, but I have 760. So I don't want to repeat the procedure and input manually the code 760 times. I want to write a macro to use in SAS and repeat the same syntax for each of the metabolites. How can I do that?
My data looks like this:
Metabolite OD P1 P2 P3 P4 P5 P6 P7 C8 C9 C10 B11 B12 B13 B14 B15
C1 492 0.80318008 0.834511094 0.755462174 0.947215787 0.887920107 0.941135272 0.854403285 0.827162124 0.774818623 1.043873527 0.980611933 0.99175232 0.899985465 2.323935576 0.989680725
C1 492 1.015295591 0.937931127 0.862409875 1.035489644 1.020100969 1.432972263 1.20098598 1.014347313 1.024901914 1.350518389 1.228546301 1.058456868 1.021602321 0.882652756 1.068231275
C1 492 0.810476853 0.767190317 0.566538969 1.160767653 1.036374265 1.007790833 1.190486783 1.113972414 0.325186332 0.907718954 1.675218213 0.906072763 1.410147143 1.060946843 1.067602052
C1 630 0.961524961 1.005846657 0.847824375 1.025462906 0.976906071 0.976627864 1.01474825 0.903212955 0.934967536 0.882814468 1.001740347 0.903248894 0.996416257 1.02681187 0.916566129
C1 630 1.554650956 0.737506567 2.452827299 1.037786536 0.874060377 0.950382623 1.081525591 2.143129784 1.077641166 1.993884723 1.685291793 0.927601975 1.097186964 0.84841252 0.942020551
C1 630 3.397638555 3.48494389 2.736307131 4.485634181 4.927877673 4.754434301 5.041446678 3.008039216 1.24514729 3.849372819 3.335763153 4.537001962 4.347699905 2.650736885 5.007861571
C2 492 0.621121776 0.655197791 0.624464533 0.774748488 0.835036637 0.890241965 1.050214203 0.766379479 0.499753317 0.708279952 0.851083004 0.833468896 0.842360044 0.536406298 0.722104984
C2 492 1.75496053 1.625140448 1.234260466 1.600459563 1.805650674 3.902582698 4.366733197 4.3322092 0.884777351 3.659221055 3.698372956 4.424445968 3.911657965 1.184654064 3.032617686
C2 492 1.136163306 0.990741638 1.008046619 1.090941503 1.065424996 1.286243284 1.162517672 1.086776372 1.050708989 0.947436205 1.255244694 1.097283143 1.064965485 1.025620139 0.974254224
C2 630 1.113004223 1.481277257 1.117820203 1.606865598 1.547740666 1.923981394 1.79028251 1.600927099 0.651330519 1.688562315 1.671669463 1.596206391 1.999786168 1.112853138 1.95607287
C2 630 0.802575958 0.63027506 0.688188658 0.879770793 0.779821048 0.884177322 0.942509034 0.755849107 0.630951119 0.712527463 0.897567203 0.847457282 0.838313324 0.696858072 0.737402398
C2 630 3.868652818 3.623364192 2.899296194 4.850127834 5.171682933 5.239876518 5.407341626 3.381502495 1.345204779 4.170354345 3.676830466 4.893081332 4.646074976 2.792233812 5.15275719
I tried creating a macro but I think I am not getting what I would like to see. Again, I have 16 bacterial species, which were grown in presence of 760 metabolites. The growth was measured at 2 different OD. I want to find out whether the OD measurement for each metabolite on each species is significantly different. I modified the macro and the model as follows:
options formdlim='-';
data vera;
input Metabolite$ OD P1 P2 P3 P4 P5 P6 P7 C8 C9 C10 B11 B12 B13 B14 B15 B16;
cards;
%macro metabolites(varsel);
PROC mixed DATA=Vera;
CLASS OD ;
MODEL &varsel = OD / ddfm=kr ;
lsmeans OD/pdiff;
RUN;
%mend
%metabolites (P1);
%metabolites (P2);
%metabolites (P3);
%metabolites (P4);
%metabolites (P5);
%metabolites (P6);
%metabolites (P7);
%metabolites (C8);
%metabolites (C9);
%metabolites (C10);
%metabolites (B11);
%metabolites (B12);
%metabolites (B13);
%metabolites (B14);
%metabolites (B15);
%metabolites (B16);
With this model I can see that all the metabolites measured in a particular species at a specific OD (492 nm, for example) are not significantly different from ALL the metabolites measured at 630 nm. Unfortunately, this does not have biological relevance and I am still needing to repeat the syntax every time that I want to find significant differences among ODs in a particular metabolite in a particular species.
I also tried sorting out the data set with the "BY" statement but I did not get any different output. Is there anything else that I am missing?
Upvotes: 0
Views: 746
Reputation: 21274
Change your data structure so you can use the BY statement as mentioned above. I haven't tested it but something like the following should work:
*flip your data;
data flipped;
set vera;
array bs(15) P1-P7 C8-C10 B11-B15;
do i = 1 to dim(bs);
BS = vname(bs(i)); *capture the name of the variable;
BS_Value = bs(i);
output;
end;
run;
proc sort data=flipped;
by metabolite bs;
run;
PROC mixed DATA=Vera;
By metabolite bs;
CLASS OD ;
MODEL BS_value = OD / SOLUTION ;
lsmeans OD/ diff;
RUN;
Upvotes: 0