subsetting from same-named data.frame in R

Question

I have a data.frame called c41 (HERE). Some column names (e.g., type) in this data frame are repeated once or twice. As a result, data.frame adds a ".number" suffix to distinguish between them.

Suppose I want to subset variable type == 3 among all column names that have a "type" root in their names. Currently, I drop the ".number" suffixes and then subset but that incorrectly returns nothing.

Question: In BASE R, how can I subset a variable value (type == 3) without needing to include the ".number" suffixes (e.g., type == 3 instead of type.1 == 3)?

In other words, how can I find any "type" whose value is 3 regardless of its numeric suffix.

c41 <- read.csv("https://raw.githubusercontent.com/izeh/l/master/c4.csv")

c42 <- setNames(c41, sub("\.\d+$", "", names(c41))) # Take off the `".number"` suffixes

subset(c42, type == 3) # Now subset ! But it return nothing!

lroha · Accepted Answer

Renaming the columns to make them non-unique is a recipe for a headache and is not advisable. Without renaming the columns, in base R you could do something like this instead:

c41[rowSums(c41[grep("^type", names(c41))] == 3, na.rm = TRUE) > 0,]

I don't think subset() can be used here if column names are duplicated.

subsetting from same-named data.frame in R

Answers (2)

Related Questions