Antje Rosebrock
Antje Rosebrock

Reputation: 15

aggregate data and list frequencies of a variable in a string

I have the following datatable

dt<-data.table(string=c("A","A","A","B","B","C"), source=c("a","b","c","a","b","a"), N=rep(1,6))
    string source  N
1:      A      a   1
2:      A      b   1
3:      A      c   1
4:      B      a   1
5:      B      b   1
6:      C      a   1

What I would like to have, is an aggregated table of the following form:

   string   a_b_c
1:      A   1|1|1
2:      B   1|1|0
3:      C   1|0|0

That means, I would like to drop the column with the source of N, but keep this information in a new column that contains information whether we have information from source a|b|c. The problem is that I do not know how to add the missing 0 if we do not have this information in the original dt.

If I try the code dt <- dt[, paste(N, collapse="|"), by = string], I get something like this:

   string     V1
 1:      A  1|1|1
 2:      B    1|1
 3:      C      1

Thank you very much for your help!

Upvotes: 0

Views: 69

Answers (2)

Onyambu
Onyambu

Reputation: 79228

library(data.table)
dcast(dt,string~source,fill=0,value.var = "N")[,paste0(.SD,collapse = "|"),by=string]
   string    V1
1:      A 1|1|1
2:      B 1|1|0
3:      C 1|0|0

Upvotes: 1

R. Prost
R. Prost

Reputation: 2088

Something like that ? Not as neat as the one liner from the comments.. (with dplyr and data.table which you already use)

dt2 <- dt %>% 
   data.table::dcast(string ~ source) %>% 
   mutate(a_b_c = paste(a, b, c, sep="|")) %>%
   select(string, a_b_c)

giving

string a_b_c
1      A 1|1|1
2      B 1|1|0
3      C 1|0|0

Upvotes: 1

Related Questions