Reputation: 7517
I have a data.frame called c41
(HERE). Some column names (e.g., type
) in this data frame are repeated once or twice. As a result, data.frame adds a ".number"
suffix to distinguish between them.
Suppose I want to subset variable type == 3
among all column names that have a "type"
root in their names. Currently, I drop the ".number"
suffixes and then subset
but that incorrectly returns nothing.
Question: In BASE R, how can I subset a variable value (type == 3
) without needing to include the ".number"
suffixes (e.g., type == 3
instead of type.1 == 3
)?
In other words, how can I find any "type"
whose value is 3 regardless of its numeric suffix.
c41 <- read.csv("https://raw.githubusercontent.com/izeh/l/master/c4.csv")
c42 <- setNames(c41, sub("\\.\\d+$", "", names(c41))) # Take off the `".number"` suffixes
subset(c42, type == 3) # Now subset ! But it return nothing!
Upvotes: 0
Views: 118
Reputation: 34406
Renaming the columns to make them non-unique is a recipe for a headache and is not advisable. Without renaming the columns, in base R you could do something like this instead:
c41[rowSums(c41[grep("^type", names(c41))] == 3, na.rm = TRUE) > 0,]
I don't think subset()
can be used here if column names are duplicated.
Upvotes: 2
Reputation: 33782
EDIT: I see that you edited your question to specify base R. Can't help you there! But perhaps a dplyr
solution is of interest.
You can use dplyr::filter_at
and the starts_with
helper.
library(dplyr)
library(readr)
c4 <- read_csv("https://raw.githubusercontent.com/izeh/l/master/c4.csv")
c4 %>%
filter_at(vars(starts_with("type")), any_vars(. == 3))
Adding a select_at
to display just the relevant columns:
c4 %>%
filter_at(vars(starts_with("type")), any_vars(. == 3)) %>%
select_at(vars(starts_with("type")))
Result:
# A tibble: 2 x 2
type type_1
<dbl> <dbl>
1 1 3
2 2 3
Upvotes: 1