Reputation: 23
I'm still a bit of a novice with R and I'm trying to create a column that has a value based on a condition across the other rows. Basically, I want to create a 4th column (in my dummy data below) that has the year (row heading) of the max value.
country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")
x2000 = c(50, 100, 75, 60, 25)
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
df = data.frame(country, region, x2000, x2001, x2002)
country region x2000 x2001 x2002
1 USA North America 50 200 125
2 Canada North America 100 50 75
3 UK Europe 75 150 60
4 France Europe 60 125 75
5 Mexico North America 25 60 25
So, I want it to look like this:
country = c("USA", "Canada", "UK", "France", "Mexico")
region = c("North America", "North America", "Europe", "Europe", "North America")
x2000 = c(50, 100, 75, 60, 25)
x2001 = c(200, 50, 150, 125, 60)
x2002 = c(125,75,60, 75, 25)
Max_year = (2001, 2000, 2001, 2001, 2001)
df = data.frame(country, region, x2000, x2001, x2002, Max)
country region x2000 x2001 x2002 Max_Year
1 USA North America 50 200 125 2001
2 Canada North America 100 50 75 2000
3 UK Europe 75 150 60 2001
4 France Europe 60 125 75 2001
5 Mexico North America 25 60 25 2001
I found this which seems so far to be the most similar to what I'm trying to do ({if...else..} statement after group_by in dplyr chain). But I want it to return a value based on the max of all potential values.
Of course, my actual data is much much larger, and I need it to group the data by two variables (region here is simplified but I need to keep it in my dataset) and then apply the condition across the data within each "group"
Here's what I was trying to work out based on the example question I had found.
Data_Tidy$Max_Year <- Data_Tidy %>%
group_by(Data_Tidy$Country, Data_Tidy$area)%>%
do(model=if(Data_Tidy$Landing = pmax())
I can either have the years as column headings (as above) or as a tidy long data.frame, and then collapse back to a wide frame.
country region Year Value
1 USA North America 2000 50
2 Canada North America 2000 100
3 UK Europe 2000 75
4 France Europe 2000 60
5 Mexico North America 2000 25
6 USA North America 2001 200
7 Canada North America 2001 50
8 UK Europe 2001 150
9 France Europe 2001 125
10 Mexico North America 2001 60
11 USA North America 2002 125
12 Canada North America 2002 75
13 UK Europe 2002 60
14 France Europe 2002 75
15 Mexico North America 2002 25
I hope this is clear. Any advice would be greatly appreciated!!
Upvotes: 1
Views: 1125
Reputation: 145
You can use apply rowwise to get the indices for the max columns.
df$max <- names(df)[apply(df, 1, function(x) which.max(x[3:6]))+2]
df
country region x2000 x2001 x2002 max
1 USA North America 50 200 125 x2001
2 Canada North America 100 50 75 x2000
3 UK Europe 75 150 60 x2001
4 France Europe 60 125 75 x2001
5 Mexico North America 25 60 25 x2001
You can easily get rid of the 'x' before the year. Or use a vector c(2000, 2001, 2002) instead of the data.frame names.
Upvotes: 2
Reputation: 79348
transform(df,max_year=as.numeric(sub("\\D","",names(df[-(1:2)])[max.col(df[-(1:2)])])))
country region x2000 x2001 x2002 max_year
1 USA North America 50 200 125 2001
2 Canada North America 100 50 75 2000
3 UK Europe 75 150 60 2001
4 France Europe 60 125 75 2001
5 Mexico North America 25 60 25 2001
Upvotes: 0