Reputation: 43
I have a dataframe with multiple columns and multiple rows, and my goal is to add, for every one of them, a new column right after it with it's proportion of the total sum of the column.
I have something like:
a b c
1 4 5
8 2 3
1 4 2
and I'm trying to transform it into something like:
a a.2 b b.2 c c.2
1 0.1 4 0.4 5 0.5
8 0.8 2 0.2 3 0.3
1 0.1 4 0.4 2 0.2
But I can't figure out a way to NAME those new columns in add_column
inside a loop.
So far, my code is as follows:
j=1
while (j <= length(colnames(eleicao))) {
i <- colnames(sample)[j]
nam <- paste("prop", i, sep = ".")
j=j+1
sample <- add_column(sample, parse(nam) = as.list(sample[i]/colSums(sample[i]))[[1]] .after = i)
}
I always get the same problem: Error: Column 'nam' already exists
.
How can I accomplish my goal? How can I make add_column
understand that I'm trying to name the column using the VALUE of 'nam'?
Upvotes: 1
Views: 1071
Reputation: 2164
Following solution relies on dplyr
included in the tidyverse.
library(tidyverse)
df <- tibble(
a = c(1, 8, 1),
b = c(4, 2, 4),
c = c(5, 3, 2)
)
df %>%
mutate_all(funs(prop = . / sum(.)))
Which returns
# A tibble: 3 x 6
a b c a_prop b_prop c_prop
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 5 0.1 0.4 0.5
2 8 2 3 0.8 0.2 0.3
3 1 4 2 0.1 0.4 0.2
Upvotes: 2
Reputation: 887088
Here is an option using prop.table
cbind(df1, prop.table(as.matrix(df1), 2))[order(rep(names(df1), 2))]
# a a.1 b b.1 c c.1
#1 1 0.1 4 0.4 5 0.5
#2 8 0.8 2 0.2 3 0.3
#3 1 0.1 4 0.4 2 0.2
Upvotes: 2
Reputation: 28339
A little bit sloppy solution (using apply
):
# Using OPs data stored in df
res <- do.call(cbind, apply(df, 2, function(x) data.frame(x, y = x / sum(x))))
# a.x a.y b.x b.y c.x c.y
# 1 1 0.1 4 0.4 5 0.5
# 2 8 0.8 2 0.2 3 0.3
# 3 1 0.1 4 0.4 2 0.2
# Name
colnames(res) <- sub(".x", "", sub(".y", ".2", names(res)))
Upvotes: 2