Reputation: 110
I am trying to reshape a dataframe using tidyR .Below is the dataframe:
data <- data.frame(class_name=c("date","date","educational","qualif","date","date", "educational","qualif"),
text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
"Master of Science, Computer Science","1996","2000",
"MAHARASHTRA INSTITUTE OF TECHNOLOGY",
"Bachelor of Science, Mechanical Engineering"))
I would like the data to look like the below image:
Upvotes: 1
Views: 209
Reputation: 42592
For the sake of completeness, here is also a solution using dcast()
from the data.table
package:
library(data.table)
setDT(data)[, rn := .I + 3L][
, dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")]
rn date educational qualif 1: 1 2000, 2003 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2: 2 1996, 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering
Note that toString()
is used as aggregation function so that the duplicate dates are concatenated in one column. This is motivated by the fact that the two date
columns in OP's expected output share the same name which may indicate that the expected output is only for display and no further processing of the date
values is required.
If column order matters and rn
isn't required, the output can be beautified to better match OP's expected result:
lvl <- c("educational", "qualif", "date")
setDT(data)[, rn := .I + 3L][, class_name := factor(class_name, levels = lvl)][
, dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")][, rn := NULL][]
educational qualif date 1: ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2000, 2003 2: MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996, 2000
Upvotes: 0
Reputation: 24272
Another solution using reshape
(less elegant than Sotos' solution):
data <- data.frame(class_name=c("date","date","educational","qualif","date","date", "educational","qualif"),
text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
"Master of Science, Computer Science","1996","2000",
"MAHARASHTRA INSTITUTE OF TECHNOLOGY",
"Bachelor of Science, Mechanical Engineering"))
nrec <- 4
data$id <- rep(1:2, each=nrec)
data$time <- rep(1:4, nrow(data)/nrec)
df <- reshape(data, v.names="text_val", idvar="id", direction="wide")[,-1]
names(df) <- c("id","date1","date2","educational","qualif")
df
# id date1 date2 educational qualif
# 1 1 2000 2003 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science
# 5 2 1996 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering
Upvotes: 1
Reputation: 51612
Here is an idea using tidyverse
. We basically group every 4 rows and spread. However, we need to make names in class_name
unique first, i.e.
library(tidyverse)
data %>%
group_by(grp = rep(seq(n()/4), each = 4)) %>%
mutate(class_name = make.unique(as.character(class_name))) %>%
spread(class_name, text_val) %>%
ungroup() %>%
select(educational, qualif, date, date.1)
Which gives,
# A tibble: 2 x 4 educational qualif date date.1 * <fctr> <fctr> <fctr> <fctr> 1 ILLINOIS INSTITUTE OF TECHNOLOGY Master of Science, Computer Science 2000 2003 2 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996 2000
Upvotes: 3