Reputation: 8986
Suppose I have the following dataframe:
df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
I want to create a list of dataframes, separating them by the first part of the column name, i.e., the columns that start with "BR" would be one element of the list, the columns that start with "USA" would be another, and so on.
I'm able to get the column names and separate them using strsplit
. However I'm not sure how would be the best way to iterate over it and separate the dataframe.
strsplit(names(df), "\\.")
gives me a list which the top level elements are the names of the columns and the second level are this sames split by "."
.
How could I iterate this list in order to get the index number of the columns that starts with the same substring, and them group those columns as elements of another list?
Upvotes: 4
Views: 755
Reputation: 27359
Dason beat me to it, but here's a different flavor of the same conceptual approach:
library(plyr)
# Use regex to get the prefixes
# Pulls any letters or digits ("\\w*") from the beginning of the string ("^")
# to the first period ("\\.") into a group, then matches all the remaining
# characters (".*"). Then replaces with the first group ("\\1" = "(\\w*)").
# In other words, it matches the whole string but replaces with only the prefix.
prefixes <- unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = names(df)))
# Subset to the variables that match the prefix
# Iterates over the prefixes and subsets based on the variable names that
# match that prefix
llply(prefixes, .fun = function(x){
y <- subset(df, select = names(df)[grep(names(df),
pattern = paste("^", x, sep = ""))])
})
I think these regexes should still give you the right results even if there are "." later in variable names:
unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = c(names(df), "FRA.c.blahblah")))
Or if a prefix appears later in a variable name:
# Add a USA variable with "FRA" in it
df2 <- data.frame(df, USA.FRANKLINS = rnorm(10))
prefixes2 <- unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = names(df2)))
llply(prefixes2, .fun = function(x){
y <- subset(df2, select = names(df2)[grep(names(df2),
pattern = paste("^", x, sep = ""))])
})
Upvotes: 3
Reputation: 61983
This will only work if the column names are always in the form you have them (split based on ".") and you want to group based on the identifier before the first ".".
df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
## Grab the component of the names we want
nm <- do.call(rbind, strsplit(colnames(df), "\\."))[,1]
## Create list with custom function using lapply
datlist <- lapply(unique(nm), function(x){df[, nm == x]})
Upvotes: 4