Gowtham SB
Gowtham SB

Reputation: 332

Transposing columns and Rows in R programming

INPUT is a csv file which contains the below.

Athlete Age Country Year    Closing Ceremony Date   Sport   Gold    Silver  Bronze
Michael Phelps  23  United States   2008    8/24/2008   Swimming    8   0   0
Michael Phelps  19  United States   2004    8/29/2004   Swimming    6   0   2
Michael Phelps  27  United States   2012    8/12/2012   Swimming    4   2   0
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    1   2   3

EXPECTED OUTPUT i need to get

Athlete Age Country Year    Closing Ceremony Date   Sport   Metal Type  Metal Count
Michael Phelps  23  United States   2008    8/24/2008   Swimming    Gold    8
Michael Phelps  19  United States   2004    8/29/2004   Swimming    Gold    6
Michael Phelps  19  United States   2004    8/29/2004   Swimming    Bronze  2
Michael Phelps  27  United States   2012    8/12/2012   Swimming    Gold    4
Michael Phelps  27  United States   2012    8/12/2012   Swimming    Silver  2
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Gold    1
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Silver  2
Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Bronze  3

Can any one help on this

But i was trying this with melt but output is not expected one .I used this command

> g <- melt(data)

I am Getting like this

           Athlete       Country Closing.Ceremony.Date    Sport variable value
1    Michael Phelps United States             8/24/2008 Swimming      Age    23
2    Michael Phelps United States             8/29/2004 Swimming      Age    19
3    Michael Phelps United States             8/12/2012 Swimming      Age    27
4  Natalie Coughlin United States             8/24/2008 Swimming      Age    25
5    Michael Phelps United States             8/24/2008 Swimming     Year  2008
6    Michael Phelps United States             8/29/2004 Swimming     Year  2004
7    Michael Phelps United States             8/12/2012 Swimming     Year  2012
8  Natalie Coughlin United States             8/24/2008 Swimming     Year  2008
9    Michael Phelps United States             8/24/2008 Swimming     Gold     8
10   Michael Phelps United States             8/29/2004 Swimming     Gold     6
11   Michael Phelps United States             8/12/2012 Swimming     Gold     4
12 Natalie Coughlin United States             8/24/2008 Swimming     Gold     1
13   Michael Phelps United States             8/24/2008 Swimming   Silver     0
14   Michael Phelps United States             8/29/2004 Swimming   Silver     0
15   Michael Phelps United States             8/12/2012 Swimming   Silver     2
16 Natalie Coughlin United States             8/24/2008 Swimming   Silver     2
17   Michael Phelps United States             8/24/2008 Swimming   Bronze     0
18   Michael Phelps United States             8/29/2004 Swimming   Bronze     2
19   Michael Phelps United States             8/12/2012 Swimming   Bronze     0
20 Natalie Coughlin United States             8/24/2008 Swimming   Bronze     3

But this not my expected output . Thanks in advance

Upvotes: 1

Views: 131

Answers (4)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

For variety, here's a concise approach in base R. Conceptually it is the same as the approach using melt (using @akrun's sample data):

subset(cbind(data[1:6], stack(data[-c(1:6)])), values > 0)
#             Athlete Age       Country Year Closing.Ceremony.Date    Sport
# 1    Michael Phelps  23 United States 2008             8/24/2008 Swimming
# 2    Michael Phelps  19 United States 2004             8/29/2004 Swimming
# 3    Michael Phelps  27 United States 2012             8/12/2012 Swimming
# 4  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
# 7    Michael Phelps  27 United States 2012             8/12/2012 Swimming
# 8  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
# 10   Michael Phelps  19 United States 2004             8/29/2004 Swimming
# 12 Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#    values    ind
# 1       8   Gold
# 2       6   Gold
# 3       4   Gold
# 4       1   Gold
# 7       2 Silver
# 8       2 Silver
# 10      2 Bronze
# 12      3 Bronze

Upvotes: 1

akrun
akrun

Reputation: 887118

We could use melt specifying the 'id' columns. Based on the expected output, the columns that are converted from 'wide' to 'long' form are 7:9 ie. Gold, Silver, Bronze columns. After we convert to long form, remove the rows that are '0' for the 'value' column with subset

library(reshape2)
subset(melt(data, id.var=1:6), value!=0)
#               Athlete Age       Country Year Closing.Ceremony.Date    Sport
#1    Michael Phelps  23 United States 2008             8/24/2008 Swimming
#2    Michael Phelps  19 United States 2004             8/29/2004 Swimming
#3    Michael Phelps  27 United States 2012             8/12/2012 Swimming
#4  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#7    Michael Phelps  27 United States 2012             8/12/2012 Swimming
#8  Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#10   Michael Phelps  19 United States 2004             8/29/2004 Swimming
#12 Natalie Coughlin  25 United States 2008             8/24/2008 Swimming
#   variable value
#1      Gold     8
#2      Gold     6
#3      Gold     4
#4      Gold     1
#7    Silver     2
#8    Silver     2
#10   Bronze     2
#12   Bronze     3

You could also use arguments such as value.name and var.name within the melt to change the 'default' variable/value column in the above.

Or the same could be done with gather from tidyr.

library(dplyr)
library(tidyr)
gather(data, Type, MedalCount, 7:9)  %>% 
                                   filter(MedalCount>0)

data

data <- structure(list(Athlete = c("Michael Phelps", "Michael Phelps", 
"Michael Phelps", "Natalie Coughlin"), Age = c(23L, 19L, 27L, 
25L), Country = c("United States", "United States", "United States", 
"United States"), Year = c(2008L, 2004L, 2012L, 2008L),
Closing.Ceremony.Date = c("8/24/2008", 
"8/29/2004", "8/12/2012", "8/24/2008"), Sport = c("Swimming", 
"Swimming", "Swimming", "Swimming"), Gold = c(8L, 6L, 4L, 1L), 
Silver = c(0L, 0L, 2L, 2L), Bronze = c(0L, 2L, 0L, 3L)),
.Names = c("Athlete", 
"Age", "Country", "Year", "Closing.Ceremony.Date", "Sport", "Gold", 
"Silver", "Bronze"), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

bgoldst
bgoldst

Reputation: 35314

Try:

r> input <- data.frame(Athlete=c('Michael Phelps','Michael Phelps','Michael Phelps','Natalie Coughlin'), Age=c(23L,19L,27L,25L), Country=c('United States','United States','United States','United States'), Year=c(2008L,2004L,2012L,2008L), ClosingCeremonyDate=c(as.Date('2008-8-24'),as.Date('2004-8-29'),as.Date('2012-8-12'),as.Date('2008-8-24')), Sport=c('Swimming','Swimming','Swimming','Swimming'), Gold=c(8L,6L,4L,1L), Silver=c(0L,0L,2L,2L), Bronze=c(0L,2L,0L,3L) );
r> output <- subset(do.call(rbind, lapply(c('Bronze','Silver','Gold'), function(m) cbind(input[,-which(colnames(input)%in%c('Bronze','Silver','Gold'))], MedalType=m, MedalCount=input[,which(colnames(input)==m)] ) ) ), MedalCount>0 );
r> print(output, row.names=F );
          Athlete Age       Country Year ClosingCeremonyDate    Sport MedalType MedalCount
   Michael Phelps  19 United States 2004          2004-08-29 Swimming    Bronze          2
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming    Bronze          3
   Michael Phelps  27 United States 2012          2012-08-12 Swimming    Silver          2
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming    Silver          2
   Michael Phelps  23 United States 2008          2008-08-24 Swimming      Gold          8
   Michael Phelps  19 United States 2004          2004-08-29 Swimming      Gold          6
   Michael Phelps  27 United States 2012          2012-08-12 Swimming      Gold          4
 Natalie Coughlin  25 United States 2008          2008-08-24 Swimming      Gold          1

You can add a call to order() if you want a specific ordering.

Upvotes: 0

lukeA
lukeA

Reputation: 54237

Try

subset(melt(df, varnames = c("Gold", "Silver", "Bronze"), id.vars = 1:8, value.name = "Count", varnames = "Metal"), Count > 0)
# Athlete Age Country Year    Closing Ceremony Date   Sport   Metal Type  Metal Count
# Michael Phelps  23  United States   2008    8/24/2008   Swimming    Gold    8
# Michael Phelps  19  United States   2004    8/29/2004   Swimming    Gold    6
# Michael Phelps  19  United States   2004    8/29/2004   Swimming    Bronze  2
# Michael Phelps  27  United States   2012    8/12/2012   Swimming    Gold    4
# Michael Phelps  27  United States   2012    8/12/2012   Swimming    Silver  2
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Gold    1
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Silver  2
# Natalie Coughlin    25  United States   2008    8/24/2008   Swimming    Bronze  3

Upvotes: 1

Related Questions