Reputation: 353
I have a table in long format as shown below, every row is unique in this input table :-
year variable
2014 ab
2014 cd
2014 ef
2016 ef
2016 gh
2014 ab
2014 cd
2014 ef
2016 ef
2016 gh
I would like convert this table into wide format but only for variable
column which looks like a contigency matrix. For example - As shown in the output table below, the combination of ab+cd appears ONCE for year 2014, and the combination of ab+ef also appears Once for year 2014. In that way, the first row of my output table clearly shows all the Counts
for different combinations variable
column of input table for different years.
year value ab cd ef gh
2014 ab 2 2 2 0
2014 cd 2 2 2 0
2014 ef 2 2 2 0
2014 ef 0 0 2 2
2016 gh 0 0 2 2
I have tried to reshape table quite a few times but have not been able to achieve the result that I want to have. I would very much appreciate if the solution includes use of data.table
. Thank you.
Upvotes: 3
Views: 448
Reputation: 3729
Assuming that you want ef-cd cells to be 1
and not 0
, here's a slightly tortured approach using igraph
and tidyverse
. The idea is to create a bipartite graph, find the 1-mode projection, and create an adjacency matrix from that projection:
library(tidyverse)
library(igraph)
df <- tibble(year = c("2014",
"2014",
"2014",
"2016",
"2016"),
variable = c("ab",
"cd",
"ef",
"ef",
"gh"))
tab <- df %>%
group_split(year) %>%
map(~ .x %>%
graph_from_data_frame(directed = FALSE) %>%
set_vertex_attr("type", value = ifelse(V(.)$name %in% .x$year, TRUE, FALSE)) %>%
bipartite_projection(which = FALSE) %>%
add_edges(rep(1:length(unique(.x$variable)), 2) %>% sort()) %>%
as_adjacency_matrix(sparse = FALSE) %>%
as_tibble()) %>%
bind_rows() %>%
mutate_all(coalesce, 0)
cbind(df, tab)
#> year variable ab cd ef gh
#> 1 2014 ab 1 1 1 0
#> 2 2014 cd 1 1 1 0
#> 3 2014 ef 1 1 1 0
#> 4 2016 ef 0 0 1 1
#> 5 2016 gh 0 0 1 1
Created on 2020-04-14 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 25225
Here is an option using data.table
:
vs <- DT[, unique(variable)]
for (x in vs) set(DT, j=x, value=0L)
DT[, (vs) := {
m <- as.matrix(.SD)
m[, match(variable, vs)] <- 1L
as.data.table(m)
}, year, .SD=vs]
DT
And also:
DT[, (vs) := {
m <- copy(.SD)
m[, match(variable, vs)] <- 1L
m
}, year, .SD=vs]
output as per problem description (as pointed out by Daniel O and meriops, there are some inconsistencies between the desired output and problem description):
year variable ab cd ef gh
1: 2014 ab 1 1 1 0
2: 2014 cd 1 1 1 0
3: 2014 ef 1 1 1 0
4: 2016 ef 0 0 1 1
5: 2016 gh 0 0 1 1
data:
library(data.table)
DT <- fread("year variable
2014 ab
2014 cd
2014 ef
2016 ef
2016 gh")
Upvotes: 2
Reputation: 4358
Here is some code in Base R
that accomplishes this task
df_new <- t(sapply(unique(df$year), function(X) lapply(unique(df$variable), function(Y) length(which(df$variable== Y & df$year == X)))))
row.names(df_new) <- unique(df$year)
colnames(df_new) <- unique(df$variable)
output:
ab cd ef gh
2014 1 1 1 0
2016 0 0 1 1
Input Data:
Input = (
' year variable
2014 ab
2014 cd
2014 ef
2016 ef
2016 gh')
df = read.table(textConnection(Input), header = T)
Upvotes: -1
Reputation: 138
df <- data.table(df)
df_dcast <- dcast.data.table(df,year~value,fun=length)
Upvotes: -1