Reputation: 623
I just encountered a weird situation in RGui...I used the same script as always to get my data.frame into the right shape for ggplot2. So my data looks like the following:
time days treatment nucleic_acid habitat parallel disturbance variable cellcounts value
1 1 2 control dna water 1 none Proteobacteria batch 0.000000000
2 2 22 control dna water 1 none Proteobacteria batch 0.003586543
3 1 2 treated dna water 1 none Proteobacteria batch 0.000000000
4 2 22 treated dna biofilm 1 none Proteobacteria NA 0.000000000
'data.frame': 185648 obs. of 10 variables:
$ time : int 5 5 5 5 5 5 6 6 6 6 ...
$ days : int 62 62 62 62 62 62 69 69 69 69 ...
$ treatment : Factor w/ 2 levels "control","treated": 2 2 2 1 1 1 2 2 2 1 ...
$ parallel : int 1 2 3 1 2 3 1 2 3 1 ...
$ nucleic_acid: Factor w/ 2 levels "cdna","dna": 1 1 1 1 1 1 1 1 1 1 ...
$ habitat : Factor w/ 2 levels "biofilm","water": 1 1 1 1 1 1 1 1 1 1 ...
$ cellcounts : Factor w/ 4 levels "batch","high",..: NA NA NA NA NA NA NA NA NA NA ...
$ disturbance : Factor w/ 3 levels "high","low","none": 3 3 3 3 3 3 3 3 3 3 ...
$ variable : Factor w/ 656 levels "Proteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 0 0 0 0 0 0 0 0 0 ...
and I wanted aggregate
to calculate the mean value of my up to 3 parallels:
df_mean<-aggregate(value~time+days+treatment+nucleic_acid+habitat+disturbance+variable+cellcounts, data = df, mean)
afterwards, the level "biofilm" in column "habitat" is lost.
df_mean<-droplevels(df_mean)
str(df_mean)
'data.frame': 44608 obs. of 9 variables:
$ time : int 1 2 1 2 1 2 1 2 1 2 ...
$ days : int 2 22 2 22 2 22 2 22 2 22 ...
$ treatment : Factor w/ 2 levels "control","treated": 1 1 2 2 1 1 2 2 1 1 ...
$ nucleic_acid: Factor w/ 2 levels "cdna","dna": 2 2 2 2 2 2 2 2 2 2 ...
$ habitat : Factor w/ 1 level "water": 1 1 1 1 1 1 1 1 1 1 ...
$ disturbance : Factor w/ 3 levels "high","low","none": 3 3 3 3 3 3 3 3 3 3 ...
$ variable : Factor w/ 656 levels "Proteobacteria",..: 1 1 1 1 2 2 2 2 3 3 ...
$ cellcounts : Factor w/ 4 levels "batch","high",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value : num 0 0.00359 0 0 0 ...
So I spent a lot of time (I actually just realised this, there were many more issues that now all seem to be aggregate
related) looking into this. I removed the column "cellcounts" and it worked. Interestingly, the columns "cellcounts" and "habitat" always carry in case of "biofilm" the same, therefore redundant, information ("biofilm" goes always with "NA"). Is this the cause? But it always worked before, so I don't get my head around this. Was there a change to the base::aggregate
function or something like that? Do you have an explanation for me? I'm using R-3.4.0, other packages used are reshape, reshape2 and ggplot2
Thx a lot, a confused crazysantaclaus
Upvotes: 0
Views: 311
Reputation: 47300
The issue comes from the NA
, maybe your file was loaded differently in the past and these were stored as strings instead of NA values ? Here's a way to solve it by setting them to a "NA"
string:
levels(df$cellcounts) <- c(levels(df$cellcounts),"NA")
df$cellcounts[is.na(df$cellcounts)] <- "NA"
df_mean <- aggregate(value ~ time+days+treatment+nucleic_acid+habitat+disturbance+variable+cellcounts, data = df, mean,na.rm=TRUE)
df_mean<-droplevels(df_mean)
str(df_mean)
'data.frame': 4 obs. of 9 variables:
$ time : int 1 2 1 2
$ days : int 2 22 2 22
$ treatment : Factor w/ 2 levels "control","treated": 1 1 2 2
$ nucleic_acid: Factor w/ 1 level "dna": 1 1 1 1
$ habitat : Factor w/ 2 levels "biofilm","water": 2 2 2 1
$ disturbance : Factor w/ 1 level "none": 1 1 1 1
$ variable : Factor w/ 1 level "Proteobacteria": 1 1 1 1
$ cellcounts : Factor w/ 2 levels "batch","NA": 1 1 1 2
$ value : num 0 0.00359 0 0
data
df <- read.table(text=" time days treatment nucleic_acid habitat parallel disturbance variable cellcounts value
1 1 2 control dna water 1 none Proteobacteria batch 0.000000000
2 2 22 control dna water 1 none Proteobacteria batch 0.003586543
3 1 2 treated dna water 1 none Proteobacteria batch 0.000000000
4 2 22 treated dna biofilm 1 none Proteobacteria NA 0.000000000
",header=T)
Upvotes: 1