Amateur
Amateur

Reputation: 1277

split dataframe in R by row

I have a long dataframe like this:

  Row  Conc   group
  1     2.5    A
  2     3.0    A
  3     4.6    B
  4     5.0    B
  5     3.2    C
  6     4.2    C
  7     5.3    D
  8     3.4    D

...

The actual data have hundreds of row. I would like to split A to C, and D. I looked up the web and found several solutions but not applicable to my case.

How to split a data frame?

For example: Case 1:

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

I don't want to split by arbitrary number

Case 2: Split by level/factor

data2 <- data[data$sum_points == 2500, ]

I don't want to split by a single factor either. Sometimes I want to combine many levels together.

Case 3: select by row number

newdf <- mydf[1:3,]

The actual data have hundreds of rows. I don't know the row number. I just know the level I would like to split at.

Upvotes: 20

Views: 72044

Answers (4)

AndrewGB
AndrewGB

Reputation: 16836

With base R, we can input the factor that we want to split on.

split(df, df$group == "D")

# Or using `with`
with(df, split(df, group == "D"))

Output

$`FALSE`
  Row Conc group
1   1  2.5     A
2   2  3.0     A
3   3  4.6     B
4   4  5.0     B
5   5  3.2     C
6   6  4.2     C

$`TRUE`
  Row Conc group
7   7  5.3     D
8   8  3.4     D

If you wanted to split on multiple letters, then we could:

split(df, df$group %in% c("A", "D"))

Another option is to use group_split from dplyr, but will need to make a grouping variable first for the split.

library(dplyr)

df %>% 
  mutate(spl = ifelse(group == "D", 1, 0)) %>% 
  group_split(spl, .keep = FALSE)

Data

df <- structure(list(Row = 1:8, Conc = c(2.5, 3, 4.6, 5, 3.2, 4.2, 
5.3, 3.4), group = c("A", "A", "B", "B", "C", "C", "D", "D")),
class = "data.frame", row.names = c(NA, -8L))

Upvotes: 1

Mikko
Mikko

Reputation: 7755

For those who end up here through internet search engines time after time, the answer to the question in the title is:

x <- data.frame(num = 1:26, let = letters, LET = LETTERS)

split(x, sort(as.numeric(rownames(x))))

Assuming that your data frame has numerically ordered row names. Also split(x, rownames(x)) works, but the result is rearranged.

Upvotes: 12

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

You may consider using the recode() function from the "car" package.

# Load the library and make up some sample data
library(car)
set.seed(1)
dat <- data.frame(Row = 1:100,
                  Conc = runif(100, 0, 10),
                  group = sample(LETTERS[1:10], 100, replace = TRUE))

Currently, dat$group contains the upper case letters A to J. Imagine we wanted the following four groups:

  • "one" = A, B, C
  • "two" = D, E, J
  • "three" = F, I
  • "four" = G, H

Now, use recode() (note the semicolon and the nested quotes).

recodes <- recode(dat$group, 
                 'c("A", "B", "C") = "one"; 
                  c("D", "E", "J") = "two"; 
                  c("F", "I") = "three"; 
                  c("G", "H") = "four"')
split(dat, recodes)

Upvotes: 0

Se&#241;or O
Se&#241;or O

Reputation: 17412

It sounds like you want two data frames, where one has (A,B,C) in it and one has just D. In that case you could do

Data1 <- subset(Data, group %in% c("A","B","C"))
Data2 <- subset(Data, group=="D")

Correct me if you were asking something different

Upvotes: 11

Related Questions