Reputation: 2429
how can I split the following data.frame
df <- data.frame(var1 = c("a", 1, 2, 3, "a", 1, 2, 3, 4, 5, 6, "a", 1, 2), var2 = 1:14)
into lists of / groups of
a 1
1 2
2 3
3 4
a 5
1 6
2 7
3 8
4 9
5 10
6 11
a 12
1 13
2 14
So basically, value "a" in column 1 is the tag / identifier I want to split the data frame on. I know about the split function but that means I have to add another column and since, as can be seen from my example, the size of the groups can vary I do not know how to automatically create such a dummy column to fit my needs.
Any ideas on that?
Cheers,
Sven
Upvotes: 6
Views: 6431
Reputation: 355
You could create a loop that loops through the entire first column of the data frame and saves the positions of non-numeric characters in a vector. Thus, you'd have something like:
data <- df$var1 #this gives you a vector of the values you'll sort through
positions <- c()
for (i in seq(1:length(data))){
if (is.numeric(data[i]) == TRUE) {
#nothing
}
else positions <- append(positions, i) #saves the positions of the non-numeric characters
}
With those positions, you shouldn't have a problem accessing splitting up the data frame from there. It's just a matter of using sequences between the values in the position vector.
Upvotes: 0
Reputation: 61983
You could find which values of the indexing vector equal "a", then create a grouping variable based on that and then use split.
df[,1] == "a"
# [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#[13] FALSE FALSE
cumsum(df[,1] == "a")
# [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3
split(df, cumsum(df[,1] == "a"))
#$`1`
# var1 var2
#1 a 1
#2 1 2
#3 2 3
#4 3 4
#
#$`2`
# var1 var2
#5 a 5
#6 1 6
#7 2 7
#8 3 8
#9 4 9
#10 5 10
#11 6 11
#
#$`3`
# var1 var2
#12 a 12
#13 1 13
#14 2 14
Upvotes: 12