Vishnu
Vishnu

Reputation: 110

reshaping duplicate rows as column headers

I am trying to reshape a dataframe using tidyR .Below is the dataframe:

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",               "educational","qualif"),
        text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
           "Master of Science, Computer Science","1996","2000",
           "MAHARASHTRA INSTITUTE OF TECHNOLOGY",
           "Bachelor of Science, Mechanical Engineering"))

I would like the data to look like the below image:

1

Upvotes: 1

Views: 209

Answers (3)

Uwe
Uwe

Reputation: 42592

For the sake of completeness, here is also a solution using dcast() from the data.table package:

library(data.table)
setDT(data)[, rn := .I + 3L][
  , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")]
   rn       date                         educational                                      qualif
1:  1 2000, 2003    ILLINOIS INSTITUTE OF TECHNOLOGY         Master of Science, Computer Science
2:  2 1996, 2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering

Note that toString() is used as aggregation function so that the duplicate dates are concatenated in one column. This is motivated by the fact that the two date columns in OP's expected output share the same name which may indicate that the expected output is only for display and no further processing of the date values is required.


If column order matters and rn isn't required, the output can be beautified to better match OP's expected result:

lvl <- c("educational", "qualif", "date")
setDT(data)[, rn := .I + 3L][, class_name := factor(class_name, levels = lvl)][
  , dcast(.SD , rn %/% 4L ~ class_name, toString, value.var = "text_val")][, rn := NULL][]
                           educational                                      qualif       date
1:    ILLINOIS INSTITUTE OF TECHNOLOGY         Master of Science, Computer Science 2000, 2003
2: MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering 1996, 2000

Upvotes: 0

Marco Sandri
Marco Sandri

Reputation: 24272

Another solution using reshape (less elegant than Sotos' solution):

data <- data.frame(class_name=c("date","date","educational","qualif","date","date",               "educational","qualif"),
        text_val=c("2000","2003","ILLINOIS INSTITUTE OF TECHNOLOGY",
           "Master of Science, Computer Science","1996","2000",
           "MAHARASHTRA INSTITUTE OF TECHNOLOGY",
           "Bachelor of Science, Mechanical Engineering"))
nrec <- 4
data$id <- rep(1:2, each=nrec)
data$time <- rep(1:4, nrow(data)/nrec)

df <- reshape(data, v.names="text_val", idvar="id", direction="wide")[,-1]
names(df) <- c("id","date1","date2","educational","qualif")
df

#   id date1 date2                         educational                                      qualif
# 1  1  2000  2003    ILLINOIS INSTITUTE OF TECHNOLOGY         Master of Science, Computer Science
# 5  2  1996  2000 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering

Upvotes: 1

Sotos
Sotos

Reputation: 51612

Here is an idea using tidyverse. We basically group every 4 rows and spread. However, we need to make names in class_name unique first, i.e.

library(tidyverse)

data %>% 
    group_by(grp = rep(seq(n()/4), each = 4)) %>% 
    mutate(class_name = make.unique(as.character(class_name))) %>% 
    spread(class_name, text_val) %>% 
    ungroup() %>% 
    select(educational, qualif, date, date.1)

Which gives,

# A tibble: 2 x 4
                          educational                                      qualif   date date.1
*                              <fctr>                                      <fctr> <fctr> <fctr>
1    ILLINOIS INSTITUTE OF TECHNOLOGY         Master of Science, Computer Science   2000   2003
2 MAHARASHTRA INSTITUTE OF TECHNOLOGY Bachelor of Science, Mechanical Engineering   1996   2000

Upvotes: 3

Related Questions