Reputation: 14309
I have the following data frame:
> str(df)
'data.frame': 52 obs. of 3 variables:
$ n : int 10 20 64 108 128 144 256 320 404 512 ...
$ step : Factor w/ 4 levels "Step1","Step2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value: num 0.00178 0.000956 0.001613 0.001998 0.002975 ...
Now I would like to normalize/divide the df$value
by the sum of values that belong to the same n i.e. so I can get the percentages. This doesn't work but shows what I would like to achieve. Here I precompute into dfa the sums of the values that belong to the same n and try to divide on the original df$value
by the aggregated total dfa$value
with matching n
:
dfa <- aggregate(x=df$value, by=list(df$n), FUN=sum)
names(dfa)[names(dfa)=="Group.1"] <- "n"
names(dfa)[names(dfa)=="x"] <- "value"
df$value <- df$value / dfa[dfa$n==df$n,][[1]]
Upvotes: 5
Views: 6205
Reputation: 466
I would use ave
:
set.seed(123)
df <- data.frame(n=rep(c(2,3,6,8), each=5), value = sample(5:60, 20))
df$value_2 <- ave(df$value, list(df$n), FUN=function(L) L/sum(L))
Upvotes: 4
Reputation: 13363
The problem with the code you have is this line:
df$value <- df$value / dfa[dfa$n==df$n,][[1]]
The line dfa$n==df$n
returns a logical vector of length max(length(df),length(dfa)
which tells you for each index if the n
matches. I don't think you can use that to match dfa$n
to df$n
.
Using base
functions, you can use aggregate
and merge
:
dfa <- aggregate(x=df$value, by=list(df$n), FUN=sum)
names(dfa) <- c("n","sum.value")
df2 <- merge(df,dfa,by="n",all = TRUE)
df2$value2 <- df2$value/df2$sum.value
Upvotes: 1
Reputation: 13363
I think the following works, using package data.table
.
df <- data.table(df)
df[,value2 := value/sum(value),by=n]
Upvotes: 5