Reputation: 159
For example I have a data frame like this:
df <- data.frame(x = 1:104, y = runif(104))
And I want to split it into 5 parts, and the extra 4 rows at the end of df
to be in the last part of the list.
I've tried these codes but they either place those last 4 rows into the first 4 parts in z
or place them into the first part of the list.
z <- split(df, rep(1:5, length.out = nrow(df), each = ceiling(nrow(df)/5)))
z <- split(df, rep(1:5, length.out = nrow(df), each = floor(nrow(df)/5)))
What I want is to split df
into 5 parts, and if the row number of df
isn't divisible by 5, I want to put those extra rows to be in the last part of z
Thanks and appreciating your response.
Upvotes: 3
Views: 143
Reputation: 101335
If you want to use rep
, you can try the code below
split(df, pmin(5, rep(1:nrow(df), each = nrow(df) %/% 5, length.out = nrow(df))))
which gives
> split(df, pmin(5, rep(1:nrow(df), each = nrow(df) %/% 5, length.out = nrow(df))))
$`1`
x y
1 1 0.3550119
2 2 0.6470292
3 3 0.2884533
4 4 0.2585698
5 5 0.2149870
6 6 0.4838779
7 7 0.5563049
8 8 0.7226308
9 9 0.4572367
10 10 0.6881226
11 11 0.4019648
12 12 0.7002047
13 13 0.4223471
14 14 0.5522625
15 15 0.1411530
16 16 0.2255064
17 17 0.5018320
18 18 0.4336198
19 19 0.5347132
20 20 0.9786630
$`2`
x y
21 21 0.79745741
22 22 0.93526757
23 23 0.95062238
24 24 0.23647215
25 25 0.68807492
26 26 0.41958031
27 27 0.15992180
28 28 0.87312563
29 29 0.25475743
30 30 0.93269723
31 31 0.44204740
32 32 0.85796635
33 33 0.92365963
34 34 0.09726958
35 35 0.32165244
36 36 0.01040126
37 37 0.30177817
38 38 0.52179936
39 39 0.16403744
40 40 0.45221789
$`3`
x y
41 41 0.50989782
42 42 0.35908556
43 43 0.65247200
44 44 0.71794047
45 45 0.29218576
46 46 0.29292081
47 47 0.42704465
48 48 0.78613530
49 49 0.75964984
50 50 0.20222792
51 51 0.39469212
52 52 0.46867551
53 53 0.06541241
54 54 0.54355706
55 55 0.17482056
56 56 0.33274424
57 57 0.59778381
58 58 0.74926797
59 59 0.54474127
60 60 0.68199123
$`4`
x y
61 61 0.76397084
62 62 0.57889274
63 63 0.96529543
64 64 0.49457189
65 65 0.13116258
66 66 0.66721471
67 67 0.92634127
68 68 0.02851204
69 69 0.34144727
70 70 0.99429707
71 71 0.40413354
72 72 0.67587272
73 73 0.39743324
74 74 0.41456676
75 75 0.35349059
76 76 0.09776186
77 77 0.90038111
78 78 0.84815278
79 79 0.32220149
80 80 0.89050820
$`5`
x y
81 81 0.08374602
82 82 0.28884681
83 83 0.16546834
84 84 0.59190647
85 85 0.79943630
86 86 0.54265060
87 87 0.49021329
88 88 0.56441657
89 89 0.19569261
90 90 0.19457069
91 91 0.95292865
92 92 0.60141956
93 93 0.53222257
94 94 0.53326103
95 95 0.01456218
96 96 0.89709932
97 97 0.41807969
98 98 0.03380717
99 99 0.11439988
100 100 0.96031135
101 101 0.28247647
102 102 0.49453450
103 103 0.99614388
104 104 0.66640157
Upvotes: 1
Reputation: 79208
You could do:
p <- 5
n <- nrow(df)
split(df, cummax(as.numeric(gl(p, n%/%p, n))))
Upvotes: 5
Reputation: 78927
We could use group_split
:
case_when
to assign the last 4 rows to id = 5group_split
to get your list.library(dplyr)
df1 <- df %>%
mutate(id = rep(row_number(), each=20, length.out = n())) %>%
mutate(id = case_when(id > 5 ~ 5,
TRUE ~ as.numeric(id))) %>%
group_split(id)
Upvotes: 3
Reputation: 160437
set.seed(42)
df <- data.frame(x = 1:104, y = runif(104))
head(df, 3)
# x y
# 1 1 0.9148060
# 2 2 0.9370754
# 3 3 0.2861395
df2 <- split(df, pmin(floor(nrow(df)/nr)-1, (seq_len(nrow(df))-1) %/% nr))
sapply(df2, nrow)
# 0 1 2 3 4
# 20 20 20 20 24
lapply(df2, head, 3)
# $`0`
# x y
# 1 1 0.9148060
# 2 2 0.9370754
# 3 3 0.2861395
# $`1`
# x y
# 21 21 0.9040314
# 22 22 0.1387102
# 23 23 0.9888917
# $`2`
# x y
# 41 41 0.37955924
# 42 42 0.43577158
# 43 43 0.03743103
# $`3`
# x y
# 61 61 0.6756073
# 62 62 0.9828172
# 63 63 0.7595443
# $`4`
# x y
# 81 81 0.5816040
# 82 82 0.1579052
# 83 83 0.3590283
Upvotes: 4