Reputation: 47
I have the following dataframe(df1). This dataframe contains a row called "Mean" that has the mean of each column.
GDP per_capita
France 2 5
Spain 4 10
Italy 6 15
Mean 4 10
I want to create a function that duplicates the columns of df1, and the value of each of these new columns is the subtraction of each cell minus its respective mean, divided by its mean. Like this:
GDP per_capita GDP_diff per_capita_diff
France 2 5 (2-4)/4 (5-10)/10
Spain 4 10 (4-4)/4 (10-10)/10
Italy 6 15 (6-4)/4 (15-10)/10
Mean 4 10 (4-4)/4 (10-10)/10
So at the end, it should look like this:
GDP per_capita GDP_diff per_capita_diff
France 2 5 -0.5 -0.5
Spain 4 10 0 0
Italy 6 15 0.5 0.5
Mean 4 10 0 0
I have to assume that every dataframe that will be using this function has a row called "Mean". So far this is what I have:
new.function <- function(df){
name.df= colnames(df)
new.df = apply(df, FUN = function(x) (x-Mean)/Mean, MARGIN = 2)
colnames(new.df) = paste(name.df,"diff",sep ="_")
result = cbind(df,new.df)
return(result)
}
However the outputs I'm getting are all wrong. It is not substracting nor dividing like I want it to.
Upvotes: 1
Views: 39
Reputation: 39613
Try this using mutate()
from dplyr
to directly compute the variables avoiding loops:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
mutate(GDP_diff=(GDP-mean(GDP))/mean(GDP),
per_capita_diff=(per_capita-mean(per_capita))/mean(per_capita))
Output:
GDP per_capita GDP_diff per_capita_diff
1 2 5 -0.5 -0.5
2 4 10 0.0 0.0
3 6 15 0.5 0.5
4 4 10 0.0 0.0
Some data used:
#Data
df <- structure(list(GDP = c(2L, 4L, 6L, 4L), per_capita = c(5L, 10L,
15L, 10L)), class = "data.frame", row.names = c("France", "Spain",
"Italy", "Mean"))
Upvotes: 1
Reputation: 784
data.table approach:
x <- data.frame(GDP = c(2,4,6), per_capita=c(5,10,15))
rownames(x) <- c("F", "ES", "IT")
library(data.table)
setDT(x)
x[,`:=`(GDP_diff = (GDP-mean(GDP, na.rm=T))/mean(GDP, na.rm=T),
per_capita_diff = (per_capita-mean(per_capita, na.rm=T))/mean(per_capita, na.rm=T))]
Upvotes: 1
Reputation: 73692
Your issue is the part (x-Mean)/Mean
; Mean
doesn't exist anywhere you probably meant mean(x)
.
new.function <- function(df){
name.df<- colnames(df)
new.df <- apply(df, MARGIN=2, FUN=function(x) (x-mean(x))/mean(x))
colnames(new.df) <- paste(name.df, "diff", sep ="_")
result <- cbind(df, new.df)
return(result)
}
new.function(df)
# GDP per_capita GDP_diff per_capita_diff
# France 2 5 -0.5 -0.5
# Spain 4 10 0.0 0.0
# Italy 6 15 0.5 0.5
# Mean 4 10 0.0 0.0
Data:
df <- structure(list(GDP = c(2L, 4L, 6L, 4L), per_capita = c(5L, 10L,
15L, 10L)), class = "data.frame", row.names = c("France", "Spain",
"Italy", "Mean"))
Upvotes: 1