Daniela D
Daniela D

Reputation: 47

R: create a function that operates on different rows and creates a new column with its value

I have the following dataframe(df1). This dataframe contains a row called "Mean" that has the mean of each column.

         GDP    per_capita 
France    2         5
Spain     4         10
Italy     6         15
Mean      4         10

I want to create a function that duplicates the columns of df1, and the value of each of these new columns is the subtraction of each cell minus its respective mean, divided by its mean. Like this:

         GDP    per_capita   GDP_diff   per_capita_diff
France    2         5        (2-4)/4      (5-10)/10
Spain     4         10       (4-4)/4      (10-10)/10
Italy     6         15       (6-4)/4      (15-10)/10
Mean      4         10       (4-4)/4      (10-10)/10

So at the end, it should look like this:

       GDP    per_capita    GDP_diff   per_capita_diff
France   2        5         -0.5          -0.5
Spain    4        10            0           0
Italy    6        15         0.5           0.5
Mean     4        10           0            0

I have to assume that every dataframe that will be using this function has a row called "Mean". So far this is what I have:

new.function <- function(df){
  name.df= colnames(df)
  new.df = apply(df, FUN = function(x) (x-Mean)/Mean, MARGIN = 2)
  colnames(new.df) = paste(name.df,"diff",sep ="_")
  result = cbind(df,new.df)
  return(result)
}

However the outputs I'm getting are all wrong. It is not substracting nor dividing like I want it to.

Upvotes: 1

Views: 39

Answers (3)

Duck
Duck

Reputation: 39613

Try this using mutate() from dplyr to directly compute the variables avoiding loops:

library(dplyr)
library(tidyr)
#Code
new <- df %>%
  mutate(GDP_diff=(GDP-mean(GDP))/mean(GDP),
         per_capita_diff=(per_capita-mean(per_capita))/mean(per_capita))

Output:

  GDP per_capita GDP_diff per_capita_diff
1   2          5     -0.5            -0.5
2   4         10      0.0             0.0
3   6         15      0.5             0.5
4   4         10      0.0             0.0

Some data used:

#Data
df <- structure(list(GDP = c(2L, 4L, 6L, 4L), per_capita = c(5L, 10L, 
15L, 10L)), class = "data.frame", row.names = c("France", "Spain", 
"Italy", "Mean"))

Upvotes: 1

Ben
Ben

Reputation: 784

data.table approach:

x <- data.frame(GDP = c(2,4,6), per_capita=c(5,10,15))
rownames(x) <- c("F", "ES", "IT")

library(data.table)

setDT(x)
x[,`:=`(GDP_diff = (GDP-mean(GDP, na.rm=T))/mean(GDP, na.rm=T),
        per_capita_diff = (per_capita-mean(per_capita, na.rm=T))/mean(per_capita, na.rm=T))]

Upvotes: 1

jay.sf
jay.sf

Reputation: 73692

Your issue is the part (x-Mean)/Mean; Mean doesn't exist anywhere you probably meant mean(x).

new.function <- function(df){
  name.df<- colnames(df)
  new.df <- apply(df, MARGIN=2, FUN=function(x) (x-mean(x))/mean(x))
  colnames(new.df) <- paste(name.df, "diff", sep ="_")
  result <- cbind(df, new.df)
  return(result)
}

new.function(df)
#        GDP per_capita GDP_diff per_capita_diff
# France   2          5     -0.5            -0.5
# Spain    4         10      0.0             0.0
# Italy    6         15      0.5             0.5
# Mean     4         10      0.0             0.0

Data:

df <- structure(list(GDP = c(2L, 4L, 6L, 4L), per_capita = c(5L, 10L, 
15L, 10L)), class = "data.frame", row.names = c("France", "Spain", 
"Italy", "Mean"))

Upvotes: 1

Related Questions