Reputation: 159

Split Dataframe into 5 Parts and Place the Extra Rows at the End

For example I have a data frame like this:

df <- data.frame(x = 1:104, y = runif(104))

And I want to split it into 5 parts, and the extra 4 rows at the end of df to be in the last part of the list.

I've tried these codes but they either place those last 4 rows into the first 4 parts in z or place them into the first part of the list.

z <- split(df, rep(1:5, length.out = nrow(df), each = ceiling(nrow(df)/5)))
z <- split(df, rep(1:5, length.out = nrow(df), each = floor(nrow(df)/5)))

What I want is to split df into 5 parts, and if the row number of df isn't divisible by 5, I want to put those extra rows to be in the last part of z

Thanks and appreciating your response.

Upvotes: 3

Answers (4)

ThomasIsCoding

Reputation: 101335

If you want to use rep, you can try the code below

split(df, pmin(5, rep(1:nrow(df), each = nrow(df) %/% 5, length.out = nrow(df))))

which gives

> split(df, pmin(5, rep(1:nrow(df), each = nrow(df) %/% 5, length.out = nrow(df))))
$`1`
    x         y
1   1 0.3550119
2   2 0.6470292
3   3 0.2884533
4   4 0.2585698
5   5 0.2149870
6   6 0.4838779
7   7 0.5563049
8   8 0.7226308
9   9 0.4572367
10 10 0.6881226
11 11 0.4019648
12 12 0.7002047
13 13 0.4223471
14 14 0.5522625
15 15 0.1411530
16 16 0.2255064
17 17 0.5018320
18 18 0.4336198
19 19 0.5347132
20 20 0.9786630

$`2`
    x          y
21 21 0.79745741
22 22 0.93526757
23 23 0.95062238
24 24 0.23647215
25 25 0.68807492
26 26 0.41958031
27 27 0.15992180
28 28 0.87312563
29 29 0.25475743
30 30 0.93269723
31 31 0.44204740
32 32 0.85796635
33 33 0.92365963
34 34 0.09726958
35 35 0.32165244
36 36 0.01040126
37 37 0.30177817
38 38 0.52179936
39 39 0.16403744
40 40 0.45221789

$`3`
    x          y
41 41 0.50989782
42 42 0.35908556
43 43 0.65247200
44 44 0.71794047
45 45 0.29218576
46 46 0.29292081
47 47 0.42704465
48 48 0.78613530
49 49 0.75964984
50 50 0.20222792
51 51 0.39469212
52 52 0.46867551
53 53 0.06541241
54 54 0.54355706
55 55 0.17482056
56 56 0.33274424
57 57 0.59778381
58 58 0.74926797
59 59 0.54474127
60 60 0.68199123

$`4`
    x          y
61 61 0.76397084
62 62 0.57889274
63 63 0.96529543
64 64 0.49457189
65 65 0.13116258
66 66 0.66721471
67 67 0.92634127
68 68 0.02851204
69 69 0.34144727
70 70 0.99429707
71 71 0.40413354
72 72 0.67587272
73 73 0.39743324
74 74 0.41456676
75 75 0.35349059
76 76 0.09776186
77 77 0.90038111
78 78 0.84815278
79 79 0.32220149
80 80 0.89050820

$`5`
      x          y
81   81 0.08374602
82   82 0.28884681
83   83 0.16546834
84   84 0.59190647
85   85 0.79943630
86   86 0.54265060
87   87 0.49021329
88   88 0.56441657
89   89 0.19569261
90   90 0.19457069
91   91 0.95292865
92   92 0.60141956
93   93 0.53222257
94   94 0.53326103
95   95 0.01456218
96   96 0.89709932
97   97 0.41807969
98   98 0.03380717
99   99 0.11439988
100 100 0.96031135
101 101 0.28247647
102 102 0.49453450
103 103 0.99614388
104 104 0.66640157

Upvotes: 1

Onyambu

Reputation: 79208

You could do:

p <- 5
n <- nrow(df)
split(df, cummax(as.numeric(gl(p, n%/%p, n))))

Upvotes: 5

TarJae

Reputation: 78927

We could use group_split:

create id of 20 entries each
use case_when to assign the last 4 rows to id = 5
then use group_split to get your list.

library(dplyr)
df1 <- df %>% 
    mutate(id = rep(row_number(), each=20, length.out = n())) %>% 
    mutate(id = case_when(id > 5 ~ 5,
                          TRUE ~ as.numeric(id))) %>% 
    group_split(id)

Upvotes: 3

r2evans

Reputation: 160437

set.seed(42)
df <- data.frame(x = 1:104, y = runif(104))
head(df, 3)
#   x         y
# 1 1 0.9148060
# 2 2 0.9370754
# 3 3 0.2861395
df2 <- split(df, pmin(floor(nrow(df)/nr)-1, (seq_len(nrow(df))-1) %/% nr))
sapply(df2, nrow)
#  0  1  2  3  4 
# 20 20 20 20 24 
lapply(df2, head, 3)
# $`0`
#   x         y
# 1 1 0.9148060
# 2 2 0.9370754
# 3 3 0.2861395
# $`1`
#     x         y
# 21 21 0.9040314
# 22 22 0.1387102
# 23 23 0.9888917
# $`2`
#     x          y
# 41 41 0.37955924
# 42 42 0.43577158
# 43 43 0.03743103
# $`3`
#     x         y
# 61 61 0.6756073
# 62 62 0.9828172
# 63 63 0.7595443
# $`4`
#     x         y
# 81 81 0.5816040
# 82 82 0.1579052
# 83 83 0.3590283

Upvotes: 4

Split Dataframe into 5 Parts and Place the Extra Rows at the End

Answers (4)

Related Questions