Reputation: 1316
I have a dataframe with 4 columns and 4 rows. For simplicity, I changed it to numeric format. The schema is as follows:
df <- structure(list(a = c(1,2,2,0),
b = c(2,1,2,2),
c = c(2,0,1,0),
d = c(0,2,1,1)),row.names=c(NA,-4L) ,class = "data.frame")
a b c d 1 1 2 2 0 2 2 1 2 2 3 2 0 1 0 4 0 2 1 1
I would like to change this data frame and obtain the following:
1 2 1 a b/c 2 b a/c/d 3 c a 4 c/d b
Is there a function or package I should look into? I have been doing lots of text processing in R recently. I'd appreciate your assistance!
Upvotes: 2
Views: 137
Reputation: 93938
tapply
fun with some row
and col
indexes (stealing df
from Ronak's answer):
tapply(
colnames(df)[col(df)],
list(row(df), unlist(df)),
FUN=paste, collapse="/"
)[,-1]
# 1 2
#1 "a" "b/c"
#2 "b" "a/c/d"
#3 "c" "a"
#4 "c/d" "b"
Basically I'm taking one long vector representing each column name in df
, and tabulating it by the combination of the row
of df
, and the original values in df
.
Upvotes: 4
Reputation: 389235
One way with dplyr
and tidyr
could be to get data in long format, remove 0 values and paste the column names together for each row and value combination. Finally get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row) %>%
filter(value != 0) %>%
group_by(row, value) %>%
summarise(val = paste(name, collapse = "/")) %>%
pivot_wider(names_from = value, values_from = val)
# row `1` `2`
# <int> <chr> <chr>
#1 1 a b/c
#2 2 b a/c/d
#3 3 c a
#4 4 c/d b
data
df <- structure(list(a = c(1L, 2L, 2L, 0L), b = c(2L, 1L, 0L, 2L),
c = c(2L, 2L, 1L, 1L), d = c(0L, 2L, 0L, 1L)), class = "data.frame",
row.names = c("1", "2", "3", "4"))
Upvotes: 3