Reputation: 360
I have seen this Subsetting a data frame based on a logical condition on a subset of rows and that https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r
I want to subset a data.frame according to a specific value in the row.names.
data <- data.frame(x1 = c(3, 7, 1, 8, 5), # Create example data
x2 = letters[1:5],
group = c("ga1", "ga2", "gb1", "gc3", "gb1"))
data # Print example data
# x1 x2 group
# 3 a ga1
# 7 b ga2
# 1 c gb1
# 8 d gc3
# 5 e gb1
I want to subset data
according to group. One subset should be the rows containing a in their group, one containing b in their group and one c. Maybe something with grepl
?
The result should look like this
data.a
# x1 x2 group
# 3 a ga1
# 7 b ga2
data.b
# x1 x2 group
# 1 c gb1
# 5 e gb1
data.c
# 8 d gc3
I would be interested in how to subset one of these output examples, or perhaps a loop would work too.
I modified the example from here https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r
Upvotes: 0
Views: 294
Reputation: 1388
Good question. This solution uses inputs and outputs that closely match the request: "I want to subset data according to group. One subset should be the rows containing a in their group, one containing b in their group and one c. Maybe something with grepl?"
.
The code below uses the data frame that was provided (named data), and uses grep(), and subsets by group.
code:
ga <- grep("ga", data$group) # seperate the data by group type
gb <- grep("gb", data$group)
gc <- grep("gc", data$group)
ga1 <- data[ga,] # subset ga
gb1 <- data[gb,] # subset gb
gc1 <- data[gc,] # subset gc
print(ga1)
print(gb1)
print(gc1)
Windows and Jupyter Lab were used. This output here closely matches the output that was shown above.
Output shown at link: link1
Upvotes: 1
Reputation: 887158
We can use group_split
with str_remove
in tidyverse
library(dplyr)
library(stringr)
data %>%
group_split(grp = str_remove(group, "\\d+$"), .keep = FALSE)
Upvotes: 1
Reputation: 388982
Extract the data which you want to split on :
sub('\\d+', '', data$group)
#[1] "ga" "ga" "gb" "gc" "gb"
and use the above in split
to divide the data into groups.
new_data <- split(data, sub('\\d+', '', data$group))
new_data
#$ga
# x1 x2 group
#1 3 a ga1
#2 7 b ga2
#$gb
# x1 x2 group
#3 1 c gb1
#5 5 e gb1
#$gc
# x1 x2 group
#4 8 d gc3
It is better to keep data in a list however, if you want separate dataframes for each group you can use list2env
.
list2env(new_data, .GlobalEnv)
Upvotes: 1