Phil
Phil

Reputation: 4444

Sum column and add result to dataframe in R

I currently have a dataframe in R that contains one variable with a unique identifier, and several variables of that contain simply binary responses (0 or 1).

A simplified version of my dataframe with two example rows:

c.names <- c("ID", "male", "female")
df <- c("ADH0004", 0, 1,
        "ADH0005", 1, 0)
df <- matrix(df, nrow = 2, byrow = T)
df <- as.data.frame(df)
names(df) <- c.names
df

In my final dataframe I will have potentially several hundred variables, all binary. I want to find a way to:

  1. obtain the column sum for each variable, and
  2. assign the column sum (1.) to a new variable (with the total copied in to each row)
  3. repeat this over each variable, so that I have n variables, and the same number of new variables with the totals in.

Returning to my simple dataframe example, my new dataframe would look like:

c.names <- c("ID", "male", "female", "male_t", "female_t")
df <- c("ADH0004", 0, 1, 1, 1,
        "ADH0005", 1, 0, 1, 1)
df <- matrix(df, nrow = 2, byrow = T)
df <- as.data.frame(df)
names(df) <- c.names
df

To do this for one variable at a time is easy (even for me). I would simply:

df$male_t <- sum(df$male)

I could do this for each variable manually, but I expect I could end up with up to a hundred, so I want to loop over this. I don't mind using a for loop (rather than apply) if that's easier, because I have a relatively small number of loops to do, so ease of coding is more of a priority than absolute speed of the code. Nevertheless I've tried both apply and for approaches.

for:

varlist <- c("male", female")
for (i in varlist) {
  df$i_t <- df$i
}

(I've tried here to emulate a for loop I saw in Stata, where the total variables are generated with `i'_t, but this doesn't seem to work in R.

I've also tried apply:

apply(df[c("male", "female")], MARGIN = 2, sum)

This gets me closer to my desired outcome, but I don't know how to save the column sums in the dataframe as new columns, rather than simply outputted to the console as they are now.

Any suggestions would be greatly appreciated, naturally I've looked extensively on both stackoverflow and the wider internet. Phil

Upvotes: 1

Views: 6186

Answers (3)

tsdata5
tsdata5

Reputation: 46

This your dataframe df is full of factors. So I chose 2 varibales male and female and convert them to the numeric

df[,c(2,3)] <- apply(df[,c(2,3)],2,as.numeric)

then sum these two variables

cbind(df,as.data.frame(t(colSums(df[,c(2,3)])))) # Is it right result???

Upvotes: 0

Robert Krzyzanowski
Robert Krzyzanowski

Reputation: 9344

You could try:

   for(var in colnames(df)[-1]) {
     df[[paste0(var, '_t')]] <- sum(df[[var]])
   }

Upvotes: 2

user20650
user20650

Reputation: 25864

# Your columns 2 & 3 are character - convert to numeric
df[,2:3] <- sapply(df[,2:3] , as.numeric)

# Get column totals for all variables except the first
c <- colSums(df[-1])

# Add to df:  c is transposed so is added as columns
# values of c are recycled, so added to all rows of df
df <- data.frame(df , t(c))

Upvotes: 2

Related Questions