Ina.Quest
Ina.Quest

Reputation: 165

add output values as new row to source df

This seems like such a simple task, its killing me that I can't figure it out.

I have the output after using apply and now all I want to do is add the output as a new row called uniq at the end of the data.frame.

df

ID  A    B    C
1   asd  dfg  ghj
2   qwe  sde  cdf
3   wed  thy  red
4   asd  sde  grf
5   swq  sde  hty

uniq = apply(df, 2, function(x)length(unique(x)))

uniq output: Named int [1:4]

ID  A   B   C
 5  4   3   5

new.df = rbind(df, uniq)

what I would like to see...

ID    A    B    C
1    asd  dfg  ghj
2    qwe  sde  cdf
3    wed  thy  red
4    asd  sde  grf
5    swq  sde  hty
5    4    3    5  

Error - There were 4 warnings (use warnings() to see them)

I look at the data and although a new row has been added, the totals are not there and instead I am getting NAs in each cell (except for two but I have no idea why).

I saw that maybe I can't just use rrbind because they are not the same types of files and even tried converting the output to a matrix like someone suggested but it doesn't work. Arghhh!

new.df <- rbind(df, matrix(uniq, ncol=25))

Error in match.names(clabs, names(xi)) : names do not match previous names

I checked the headers and they matched - after all the uniq data came from the original df.

Any help greatly appreciated.

Upvotes: 0

Views: 163

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99371

It's likely that you've got factor columns. I'll start by saying that what you're attempting is not a very good idea anyway because the columns of a data frame hold the variables, so doing this actually adds one observation to each column.

But you can solve your problem and get the result you desire by coercing the factor columns to characters and appending the calculation. Beginning with a data frame df

sapply(df, class)
#       ID         A         B         C 
# "integer"  "factor"  "factor"  "factor" 

We can use a little function f to manipulate the columns

f <- function(x) {
    c(if(is.factor(x)) levels(x)[x] else x, length(unique(x)))
}

And now ID is still numeric, but the other three columns are characters, and can be coerced to new factors by setting stringsAsFactors = FALSE when creating the new data frame

data.frame(lapply(df, f), stringsAsFactors = FALSE)
#   ID   A   B   C
# 1  1 asd dfg ghj
# 2  2 qwe sde cdf
# 3  3 wed thy red
# 4  4 asd sde grf
# 5  5 swq sde hty
# 6  5   4   3   5

Upvotes: 1

Related Questions