Reputation: 332
INPUT is a csv file which contains the below.
Athlete Age Country Year Closing Ceremony Date Sport Gold Silver Bronze
Michael Phelps 23 United States 2008 8/24/2008 Swimming 8 0 0
Michael Phelps 19 United States 2004 8/29/2004 Swimming 6 0 2
Michael Phelps 27 United States 2012 8/12/2012 Swimming 4 2 0
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming 1 2 3
EXPECTED OUTPUT i need to get
Athlete Age Country Year Closing Ceremony Date Sport Metal Type Metal Count
Michael Phelps 23 United States 2008 8/24/2008 Swimming Gold 8
Michael Phelps 19 United States 2004 8/29/2004 Swimming Gold 6
Michael Phelps 19 United States 2004 8/29/2004 Swimming Bronze 2
Michael Phelps 27 United States 2012 8/12/2012 Swimming Gold 4
Michael Phelps 27 United States 2012 8/12/2012 Swimming Silver 2
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Gold 1
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Silver 2
Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Bronze 3
Can any one help on this
But i was trying this with melt but output is not expected one .I used this command
> g <- melt(data)
I am Getting like this
Athlete Country Closing.Ceremony.Date Sport variable value
1 Michael Phelps United States 8/24/2008 Swimming Age 23
2 Michael Phelps United States 8/29/2004 Swimming Age 19
3 Michael Phelps United States 8/12/2012 Swimming Age 27
4 Natalie Coughlin United States 8/24/2008 Swimming Age 25
5 Michael Phelps United States 8/24/2008 Swimming Year 2008
6 Michael Phelps United States 8/29/2004 Swimming Year 2004
7 Michael Phelps United States 8/12/2012 Swimming Year 2012
8 Natalie Coughlin United States 8/24/2008 Swimming Year 2008
9 Michael Phelps United States 8/24/2008 Swimming Gold 8
10 Michael Phelps United States 8/29/2004 Swimming Gold 6
11 Michael Phelps United States 8/12/2012 Swimming Gold 4
12 Natalie Coughlin United States 8/24/2008 Swimming Gold 1
13 Michael Phelps United States 8/24/2008 Swimming Silver 0
14 Michael Phelps United States 8/29/2004 Swimming Silver 0
15 Michael Phelps United States 8/12/2012 Swimming Silver 2
16 Natalie Coughlin United States 8/24/2008 Swimming Silver 2
17 Michael Phelps United States 8/24/2008 Swimming Bronze 0
18 Michael Phelps United States 8/29/2004 Swimming Bronze 2
19 Michael Phelps United States 8/12/2012 Swimming Bronze 0
20 Natalie Coughlin United States 8/24/2008 Swimming Bronze 3
But this not my expected output . Thanks in advance
Upvotes: 1
Views: 131
Reputation: 193517
For variety, here's a concise approach in base R. Conceptually it is the same as the approach using melt
(using @akrun's sample data):
subset(cbind(data[1:6], stack(data[-c(1:6)])), values > 0)
# Athlete Age Country Year Closing.Ceremony.Date Sport
# 1 Michael Phelps 23 United States 2008 8/24/2008 Swimming
# 2 Michael Phelps 19 United States 2004 8/29/2004 Swimming
# 3 Michael Phelps 27 United States 2012 8/12/2012 Swimming
# 4 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# 7 Michael Phelps 27 United States 2012 8/12/2012 Swimming
# 8 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# 10 Michael Phelps 19 United States 2004 8/29/2004 Swimming
# 12 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# values ind
# 1 8 Gold
# 2 6 Gold
# 3 4 Gold
# 4 1 Gold
# 7 2 Silver
# 8 2 Silver
# 10 2 Bronze
# 12 3 Bronze
Upvotes: 1
Reputation: 887118
We could use melt
specifying the 'id' columns. Based on the expected output, the columns that are converted from 'wide' to 'long' form are 7:9 ie. Gold, Silver, Bronze columns. After we convert to long
form, remove the rows that are '0' for the 'value' column with subset
library(reshape2)
subset(melt(data, id.var=1:6), value!=0)
# Athlete Age Country Year Closing.Ceremony.Date Sport
#1 Michael Phelps 23 United States 2008 8/24/2008 Swimming
#2 Michael Phelps 19 United States 2004 8/29/2004 Swimming
#3 Michael Phelps 27 United States 2012 8/12/2012 Swimming
#4 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
#7 Michael Phelps 27 United States 2012 8/12/2012 Swimming
#8 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
#10 Michael Phelps 19 United States 2004 8/29/2004 Swimming
#12 Natalie Coughlin 25 United States 2008 8/24/2008 Swimming
# variable value
#1 Gold 8
#2 Gold 6
#3 Gold 4
#4 Gold 1
#7 Silver 2
#8 Silver 2
#10 Bronze 2
#12 Bronze 3
You could also use arguments such as value.name
and var.name
within the melt
to change the 'default' variable/value
column in the above.
Or the same could be done with gather
from tidyr
.
library(dplyr)
library(tidyr)
gather(data, Type, MedalCount, 7:9) %>%
filter(MedalCount>0)
data <- structure(list(Athlete = c("Michael Phelps", "Michael Phelps",
"Michael Phelps", "Natalie Coughlin"), Age = c(23L, 19L, 27L,
25L), Country = c("United States", "United States", "United States",
"United States"), Year = c(2008L, 2004L, 2012L, 2008L),
Closing.Ceremony.Date = c("8/24/2008",
"8/29/2004", "8/12/2012", "8/24/2008"), Sport = c("Swimming",
"Swimming", "Swimming", "Swimming"), Gold = c(8L, 6L, 4L, 1L),
Silver = c(0L, 0L, 2L, 2L), Bronze = c(0L, 2L, 0L, 3L)),
.Names = c("Athlete",
"Age", "Country", "Year", "Closing.Ceremony.Date", "Sport", "Gold",
"Silver", "Bronze"), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1
Reputation: 35314
Try:
r> input <- data.frame(Athlete=c('Michael Phelps','Michael Phelps','Michael Phelps','Natalie Coughlin'), Age=c(23L,19L,27L,25L), Country=c('United States','United States','United States','United States'), Year=c(2008L,2004L,2012L,2008L), ClosingCeremonyDate=c(as.Date('2008-8-24'),as.Date('2004-8-29'),as.Date('2012-8-12'),as.Date('2008-8-24')), Sport=c('Swimming','Swimming','Swimming','Swimming'), Gold=c(8L,6L,4L,1L), Silver=c(0L,0L,2L,2L), Bronze=c(0L,2L,0L,3L) );
r> output <- subset(do.call(rbind, lapply(c('Bronze','Silver','Gold'), function(m) cbind(input[,-which(colnames(input)%in%c('Bronze','Silver','Gold'))], MedalType=m, MedalCount=input[,which(colnames(input)==m)] ) ) ), MedalCount>0 );
r> print(output, row.names=F );
Athlete Age Country Year ClosingCeremonyDate Sport MedalType MedalCount
Michael Phelps 19 United States 2004 2004-08-29 Swimming Bronze 2
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Bronze 3
Michael Phelps 27 United States 2012 2012-08-12 Swimming Silver 2
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Silver 2
Michael Phelps 23 United States 2008 2008-08-24 Swimming Gold 8
Michael Phelps 19 United States 2004 2004-08-29 Swimming Gold 6
Michael Phelps 27 United States 2012 2012-08-12 Swimming Gold 4
Natalie Coughlin 25 United States 2008 2008-08-24 Swimming Gold 1
You can add a call to order()
if you want a specific ordering.
Upvotes: 0
Reputation: 54237
Try
subset(melt(df, varnames = c("Gold", "Silver", "Bronze"), id.vars = 1:8, value.name = "Count", varnames = "Metal"), Count > 0)
# Athlete Age Country Year Closing Ceremony Date Sport Metal Type Metal Count
# Michael Phelps 23 United States 2008 8/24/2008 Swimming Gold 8
# Michael Phelps 19 United States 2004 8/29/2004 Swimming Gold 6
# Michael Phelps 19 United States 2004 8/29/2004 Swimming Bronze 2
# Michael Phelps 27 United States 2012 8/12/2012 Swimming Gold 4
# Michael Phelps 27 United States 2012 8/12/2012 Swimming Silver 2
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Gold 1
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Silver 2
# Natalie Coughlin 25 United States 2008 8/24/2008 Swimming Bronze 3
Upvotes: 1