skan
skan

Reputation: 7720

Generate a table of values for several columns

Say I have a dataframe or datatable.

For example:

try <- data.frame(AA=c(1,2,3,1,2,3,4,5,NA),BB=c(1,2,2,NA,
2,1,2,2,NA), CC=c("A","B", NA, NA, "A","B", "A","C","B"))
setDT(try)

 AA BB CC
 1  1   A
 2  2   B
 3  2   NA
 1 NA   NA
 2  2   A
 3  1   B
 4  2   A
 5  2   C
 NA NA  B

I want to summarize the values in order to export them to an Excel file for further manipulation later.

I could create a table for each column but, in real life, some variable could have too many different values (such as the weight or DOB of people).

I can get the first six value for a single column with:

table(try$BB,  useNA ="ifany") 

1    2 <NA> 
2    5    2 

But when I try to do it automatically for all the columns at once it doesn't work as expected:

try[,lapply(.SD,function(x) table(x,useNA="ifany")[1:6] )]

because the table() command generates a 2 rows result and only one is used to create the final summary table.

What procedure do you suggest to keep that information?

For example I could try to convert that single-variable tables to something like

"1":2    "2":5 "NA":2

But I don't know how to do it. Maybe converting it to factors, maybe pasting the values. I'm not even able to extract the rows of the table for further manipulation. Any solution with base data.frame or date.table is welcome.

Or I could even order that table to get the most common values first.

PD: I want somethin like this:

AA  "1":2   "2":2   "3":2   "4":1   "5":1   "NA": 1
BB  "1":2   "2":5   "NA": 2         
CC  "A":3   "B":3   "C":1   "NA": 2     

PD2: I've tried this

try[,lapply(.SD, function(x) { tmp <- table(x, 
     useNA ="ifany") ;  mapply(paste0, names( tmp ), 
     rep(":", length(tmp)), tmp )} )
   ]

But it's too long and it doesn't work well

 AA   BB   CC
1:2  1:2  A:3
2:2  2:5  B:3
3:2 NA:2  C:1
4:1  1:2  NA:2
5:1  1:2  A:3
NA:1  2:5  B:3

It fills the last values with fake values.

Another option would be to interleave the names and the values.

In this example I should get:

  AA   BB   CC
 "1:2"  "1:2"  "A:3"
 "2:2"  "2:5"  "B:3"
 "3:2"  "NA:2" "C:1"
 "4:1"    NA   "NA:2"
 "5:1"    NA    NA
 "NA:1"   NA    NA

The problem is that the list is converted internally to a datatable by the command as.data.table.list() and the different size vectors are recycled instead of filled with NAs.

Upvotes: 0

Views: 181

Answers (2)

skan
skan

Reputation: 7720

This is my data.table solution with some ideas from Frank.

siz <- 6
try[,lapply(.SD, function(x) { tmp <- table(x, 
useNA ="ifany") ;  tmp2 <- c(tmp[is.na(names(tmp))],
rev(sort(tmp[!is.na(names(tmp))]))); 
tmp3 <- mapply(paste0, names( tmp2 ),rep(":",
length(tmp2)),tmp2); length(tmp3)<-siz; tmp3})
]

It places the NAs always at the beginning and order the other elements from the most common to the least common.

Maybe there are some simpler ways to summarize the information.

Upvotes: 1

Frank
Frank

Reputation: 66819

You can get your desired output with

library(magrittr)
tab = try %>% lapply(table, useNA = "ifany")
len = max(lengths(tab))

tab %>% lapply(
    . %>% 
    { paste0(names(.), ":", .) } %>% 
    `length<-`(len)
) %>% setDF %>% print


    AA   BB   CC
1  1:2  1:2  A:3
2  2:2  2:5  B:3
3  3:2 NA:2  C:1
4  4:1 <NA> NA:2
5  5:1 <NA> <NA>
6 NA:1 <NA> <NA>

I haven't learned purrr, but if you like using pipes, that might offer somewhat cleaner code.

Upvotes: 1

Related Questions