shiva
shiva

Reputation: 115

Using R to manipulate dataframe: each row of a column to separate columns

I have a longer dataframe with student name, subjects, question names and their marks. A short version of this dataframe looks like the below:

c1 c2 c3 c4 
A  1  a   1
A  1  b   0.5
A  1  c   1
A  2  a   2
A  2  b   1.5
A  2  c   3
A  2  d   3
B  1  a   0
B  1  b   1.5
B  1  c   2
B  2  a   2
B  2  b   2.5
B  2  c   4
B  2  d   5

Here A, B are students, 1,2 are subjects, a, b,c are question names and the last column is marks obtained by each student, in each subject and in each question.

I would like to consolidate part of dataframe with question name column becoming heading for several columns with corresponding marks like below:

Student         Sub:1         Sub:2
Name       a    b     c     a   b  c  d
A          1   0.5    1     2  1.5 3  3
B          0   1.5    2     2  2.5 4  5

I can use transpose function (t(dataframe)) to switch column to rows but I don't know how to do partly retaining details of students and marks. Can some one guide how can I achieve this?

Upvotes: 2

Views: 134

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101663

Here is base R solution using reshape()

df$namecol <- with(df,paste0(c2,c3))
dfout <- reshape(df[-c(2:3)],direction = "wide",idvar = "c1",timevar = "namecol")
names(dfout) <- gsub(".*\\.(.*)","\\1",names(dfout))

such that

> dfout
  c1 1a  1b 1c 2a  2b 2c 2d
1  A  1 0.5  1  2 1.5  3  3
8  B  0 1.5  2  2 2.5  4  5

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389012

We can use pivot_wider, supplying column names from c2 and c3 columns and value from c4.

tidyr::pivot_wider(df,names_from = c(c2, c3),values_from = c4, names_prefix = "Sub")

#  c1    Sub1_a Sub1_b Sub1_c Sub2_a Sub2_b Sub2_c Sub2_d
#  <fct>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1 A          1    0.5      1      2    1.5      3      3
#2 B          0    1.5      2      2    2.5      4      5

Or with data.table, dcast

data.table::dcast(df, c1~paste0("Sub",c2)+c3, value.var = "c4")

data

df <- structure(list(c1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
c2 = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), c3 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 1L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), 
c4 = c(1, 0.5, 1, 2, 1.5, 3, 3, 0, 1.5, 2, 2, 2.5, 4, 5)), class = "data.frame", 
row.names = c(NA, -14L))

Upvotes: 2

Related Questions