Reputation: 31
I have a data frame that looks like this:
S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20
I would like to filter and select the state(from S1State and S2State) that has the maximum value(from S1Value and S2Value). The result should look like this:
SState SValue
VIC 30
WA 30
I am new to R and have been experimenting with dplyr.
Upvotes: 1
Views: 146
Reputation: 93813
Just to show that this is far from impossible with standard R tools:
nams <- c("State","Value")
tmp <- reshape(dt, direction="long", varying=lapply(nams, grep, x=names(dt)),
v.names=nams, timevar=NULL)
tmp[with(tmp, Value == ave(Value, id, FUN=max)),]
# State Value id
#2.1 WA 30 2
#1.2 VIC 30 1
Upvotes: 2
Reputation: 39154
I assume that the OP may have more states in the data frame, such as S3State
, S4State
, ...
The following solutions are based on this assumption, trying to be able to process more than one states. If there are only two states, the approach proposed by @lebelinoz is simple and straightforward.
A solution using functions from dplyr
and tidyr
. dt2
is the final output.
# Load packages
library(dplyr)
library(tidyr)
# Process the data
dt2 <- dt %>%
gather(Num, Value, contains("Value")) %>%
gather(State, Name, contains("State")) %>%
# Only keep records with the same state number
filter(substring(Num, 1, 2) == substring(State, 1, 2)) %>%
mutate(Group = substring(Num, 1, 2)) %>%
group_by(Group) %>%
filter(Value == max(Value)) %>%
ungroup() %>%
select(SState = Name, SSValue = Value)
A solution using functions from dplyr
, purrr
, and stringr
. I loaded the package tidyverse
for the first two packages. Again, dt2
is the final output.
# Load packages
library(tidyverse)
library(stringr)
# Extract the column names
Col <- colnames(dt)
# Extract state numbers
ColNum <- Col %>%
str_extract(pattern = "[0-9]") %>%
unique()
# Design a function to process the data
dt_process <- function(pattern, dt){
dt2 <- dt %>%
# Extract columns based on a pattern (numbers)
select(dplyr::contains(pattern)) %>%
# Rename the columns
rename_all(~sub(pattern, "", .)) %>%
# Filter the maximum row
filter(SValue == max(SValue))
return(dt2)
}
# Apply the dt_process function
dt_list <- map(.x = ColNum, .f = dt_process, dt = dt)
# Bind all data frames
dt2 <- bind_rows(dt_list) %>% arrange(SState)
# Create example data frame
dt <- read.table(text = "S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 5068
The answer I was hinting at is as follows:
library(dplyr)
dt <- read.table(text = "S1State S1Value S2State S2Value
NSW 20 VIC 30
WA 30 NSW 20",
header = TRUE, stringsAsFactors = FALSE)
answer = dt %>%
mutate(SState = ifelse(S1Value > S2Value, S1State, S2State),
SValue = ifelse(S1Value > S2Value, S1Value, S2Value)) %>%
select(SState, SValue)
Upvotes: 2