Reputation: 7720
Say I have a dataframe or datatable.
For example:
try <- data.frame(AA=c(1,2,3,1,2,3,4,5,NA),BB=c(1,2,2,NA,
2,1,2,2,NA), CC=c("A","B", NA, NA, "A","B", "A","C","B"))
setDT(try)
AA BB CC
1 1 A
2 2 B
3 2 NA
1 NA NA
2 2 A
3 1 B
4 2 A
5 2 C
NA NA B
I want to summarize the values in order to export them to an Excel file for further manipulation later.
I could create a table for each column but, in real life, some variable could have too many different values (such as the weight or DOB of people).
I can get the first six value for a single column with:
table(try$BB, useNA ="ifany")
1 2 <NA>
2 5 2
But when I try to do it automatically for all the columns at once it doesn't work as expected:
try[,lapply(.SD,function(x) table(x,useNA="ifany")[1:6] )]
because the table() command generates a 2 rows result and only one is used to create the final summary table.
What procedure do you suggest to keep that information?
For example I could try to convert that single-variable tables to something like
"1":2 "2":5 "NA":2
But I don't know how to do it. Maybe converting it to factors, maybe pasting the values. I'm not even able to extract the rows of the table for further manipulation. Any solution with base data.frame or date.table is welcome.
Or I could even order that table to get the most common values first.
PD: I want somethin like this:
AA "1":2 "2":2 "3":2 "4":1 "5":1 "NA": 1
BB "1":2 "2":5 "NA": 2
CC "A":3 "B":3 "C":1 "NA": 2
PD2: I've tried this
try[,lapply(.SD, function(x) { tmp <- table(x,
useNA ="ifany") ; mapply(paste0, names( tmp ),
rep(":", length(tmp)), tmp )} )
]
But it's too long and it doesn't work well
AA BB CC
1:2 1:2 A:3
2:2 2:5 B:3
3:2 NA:2 C:1
4:1 1:2 NA:2
5:1 1:2 A:3
NA:1 2:5 B:3
It fills the last values with fake values.
Another option would be to interleave the names and the values.
In this example I should get:
AA BB CC
"1:2" "1:2" "A:3"
"2:2" "2:5" "B:3"
"3:2" "NA:2" "C:1"
"4:1" NA "NA:2"
"5:1" NA NA
"NA:1" NA NA
The problem is that the list is converted internally to a datatable by the command as.data.table.list() and the different size vectors are recycled instead of filled with NAs.
Upvotes: 0
Views: 181
Reputation: 7720
This is my data.table solution with some ideas from Frank.
siz <- 6
try[,lapply(.SD, function(x) { tmp <- table(x,
useNA ="ifany") ; tmp2 <- c(tmp[is.na(names(tmp))],
rev(sort(tmp[!is.na(names(tmp))])));
tmp3 <- mapply(paste0, names( tmp2 ),rep(":",
length(tmp2)),tmp2); length(tmp3)<-siz; tmp3})
]
It places the NAs always at the beginning and order the other elements from the most common to the least common.
Maybe there are some simpler ways to summarize the information.
Upvotes: 1
Reputation: 66819
You can get your desired output with
library(magrittr)
tab = try %>% lapply(table, useNA = "ifany")
len = max(lengths(tab))
tab %>% lapply(
. %>%
{ paste0(names(.), ":", .) } %>%
`length<-`(len)
) %>% setDF %>% print
AA BB CC
1 1:2 1:2 A:3
2 2:2 2:5 B:3
3 3:2 NA:2 C:1
4 4:1 <NA> NA:2
5 5:1 <NA> <NA>
6 NA:1 <NA> <NA>
I haven't learned purrr, but if you like using pipes, that might offer somewhat cleaner code.
Upvotes: 1