relu
relu

Reputation: 353

Transforming long table into wide format with counts for only one column

I have a table in long format as shown below, every row is unique in this input table :-

 year variable
  2014   ab  
  2014   cd  
  2014   ef 
  2016   ef 
  2016   gh
  2014   ab  
  2014   cd  
  2014   ef 
  2016   ef 
  2016   gh

I would like convert this table into wide format but only for variable column which looks like a contigency matrix. For example - As shown in the output table below, the combination of ab+cd appears ONCE for year 2014, and the combination of ab+ef also appears Once for year 2014. In that way, the first row of my output table clearly shows all the Counts for different combinations variable column of input table for different years.

year    value  ab  cd  ef  gh  
2014    ab     2    2   2   0
2014    cd     2    2   2   0 
2014    ef     2    2   2   0 
2014    ef     0    0   2   2 
2016    gh     0    0   2   2 

I have tried to reshape table quite a few times but have not been able to achieve the result that I want to have. I would very much appreciate if the solution includes use of data.table. Thank you.

Upvotes: 3

Views: 448

Answers (4)

paqmo
paqmo

Reputation: 3729

Assuming that you want ef-cd cells to be 1 and not 0, here's a slightly tortured approach using igraph and tidyverse. The idea is to create a bipartite graph, find the 1-mode projection, and create an adjacency matrix from that projection:

library(tidyverse)
library(igraph)

df <- tibble(year = c("2014",
                "2014",
                "2014",
                "2016",
                "2016"),
             variable = c("ab",
                          "cd",
                          "ef",
                          "ef",
                          "gh"))

tab <- df %>% 
  group_split(year) %>% 
  map(~ .x %>% 
        graph_from_data_frame(directed = FALSE) %>% 
        set_vertex_attr("type", value = ifelse(V(.)$name %in% .x$year, TRUE, FALSE)) %>% 
        bipartite_projection(which = FALSE) %>% 
        add_edges(rep(1:length(unique(.x$variable)), 2) %>% sort()) %>% 
        as_adjacency_matrix(sparse = FALSE) %>% 
        as_tibble()) %>% 
  bind_rows() %>% 
  mutate_all(coalesce, 0)

cbind(df, tab)
#>   year variable ab cd ef gh
#> 1 2014       ab  1  1  1  0
#> 2 2014       cd  1  1  1  0
#> 3 2014       ef  1  1  1  0
#> 4 2016       ef  0  0  1  1
#> 5 2016       gh  0  0  1  1

Created on 2020-04-14 by the reprex package (v0.3.0)

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

Here is an option using data.table:

vs <- DT[, unique(variable)]
for (x in vs) set(DT, j=x, value=0L)
DT[, (vs) := {
        m <- as.matrix(.SD)
        m[, match(variable, vs)] <- 1L
        as.data.table(m)
    }, year, .SD=vs]
DT

And also:

DT[, (vs) := {
        m <- copy(.SD)
        m[, match(variable, vs)] <- 1L
        m
    }, year, .SD=vs]

output as per problem description (as pointed out by Daniel O and meriops, there are some inconsistencies between the desired output and problem description):

   year variable ab cd ef gh
1: 2014       ab  1  1  1  0
2: 2014       cd  1  1  1  0
3: 2014       ef  1  1  1  0
4: 2016       ef  0  0  1  1
5: 2016       gh  0  0  1  1

data:

library(data.table)
DT <- fread("year variable
2014   ab  
2014   cd  
2014   ef 
2016   ef 
2016   gh")

Upvotes: 2

Daniel O
Daniel O

Reputation: 4358

Here is some code in Base R that accomplishes this task

df_new <- t(sapply(unique(df$year), function(X) lapply(unique(df$variable), function(Y)  length(which(df$variable== Y & df$year == X)))))
row.names(df_new) <- unique(df$year)
colnames(df_new) <- unique(df$variable)

output:

     ab cd ef gh
2014 1  1  1  0 
2016 0  0  1  1 

Input Data:

Input = (
  ' year variable
  2014   ab  
  2014   cd  
  2014   ef 
  2016   ef 
  2016   gh')
df = read.table(textConnection(Input), header = T)

Upvotes: -1

Prahlad
Prahlad

Reputation: 138

df <- data.table(df)
df_dcast <- dcast.data.table(df,year~value,fun=length)

Upvotes: -1

Related Questions