Shahin
Shahin

Reputation: 1316

Formatting a data.frame with binary values

I have a dataframe with 4 columns and 4 rows. For simplicity, I changed it to numeric format. The schema is as follows:

df <- structure(list(a = c(1,2,2,0),
                     b = c(2,1,2,2),
                     c = c(2,0,1,0),
                     d = c(0,2,1,1)),row.names=c(NA,-4L) ,class = "data.frame")
  a b c d
1 1 2 2 0
2 2 1 2 2
3 2 0 1 0
4 0 2 1 1

I would like to change this data frame and obtain the following:

   1     2
1  a     b/c
2  b     a/c/d
3  c     a
4  c/d   b

Is there a function or package I should look into? I have been doing lots of text processing in R recently. I'd appreciate your assistance!

Upvotes: 2

Views: 137

Answers (2)

thelatemail
thelatemail

Reputation: 93938

tapply fun with some row and col indexes (stealing df from Ronak's answer):

tapply(
  colnames(df)[col(df)],
  list(row(df), unlist(df)),
  FUN=paste, collapse="/"
)[,-1]

#  1     2      
#1 "a"   "b/c"  
#2 "b"   "a/c/d"
#3 "c"   "a"    
#4 "c/d" "b" 

Basically I'm taking one long vector representing each column name in df, and tabulating it by the combination of the row of df, and the original values in df.

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 389235

One way with dplyr and tidyr could be to get data in long format, remove 0 values and paste the column names together for each row and value combination. Finally get the data in wide format.

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row) %>%
  filter(value != 0) %>%
  group_by(row, value) %>%
  summarise(val = paste(name, collapse = "/")) %>%
  pivot_wider(names_from = value, values_from = val)

#    row `1`   `2`  
#  <int> <chr> <chr>
#1     1 a     b/c  
#2     2 b     a/c/d
#3     3 c     a    
#4     4 c/d   b    

data

df <- structure(list(a = c(1L, 2L, 2L, 0L), b = c(2L, 1L, 0L, 2L), 
c = c(2L, 2L, 1L, 1L), d = c(0L, 2L, 0L, 1L)), class = "data.frame", 
row.names = c("1", "2", "3", "4"))

Upvotes: 3

Related Questions