Reputation: 115
I posted a similar question before but my question has now changed so thought I'd post again. I have a data table with two columns; number and value, as follows;
number value
1 test1
1 test2
1 test3
2 test4
2 test5
3 test6
3 test7
3 test8
4 test9
5 test10
6 test11
7 test12
8 test13
9 test14
10 test15
11 test16
12 test17
13 test18
14 test19
15 test20
16 test21
17 test22
18 test23
19 test24
20 test25
21 test26
22 test27
23 test28
I would like to export the data table as multiple .txt files. The first text file should contain a subset of the entire data table where number is between 1-20. The second text file should contain a subset of the data where the number is between 21-40, the third where the number is between 41-60 and so on. The data table is dynamic, so the number of .txt files exported will vary.
Furthermore, in all .txt files, the 'number' must be between 1-20. So if the number is 21, it must be renamed to 1, if the number is 22, it must be renamed to 2, etc.
Is anybody able to help? In the example above, there should be 2 .txt files, the first with 25 rows and the second with 3 rows, and the second .txt file containing numbers 21,22,23 renamed to 1,2,3.
Upvotes: 0
Views: 412
Reputation: 7174
Firstly, I would split your data frame into chunks of 20 rows using split()
. This function will split your data frame according to some criterion. In your case, this criterion could be something like: "what is the outcome of the row number divided by 20 (rounded to up/down to an integer)?". According to this rule, the input data will be split.
nrows <- 1:nrow(df)
df <- split(df, floor(nrows/20))
Edit: If you want to split according to the value in df$number
, you should use df <- split(df, floor((df$number-1)/20))
Secondly, you must somehow transform deduct multiples of 20 for all number higher than 20. I would have used modulo %% 20
, but that also transform 20 to zero.
ready_for_export <- lapply(df, function(x){
x$number <- (x$number - floor((x$number-1)/20)*20)
return(x)})
Finally, save the element in the list ready_for_export
in separate txt documents. I'd use a for
-loop for this:
for(i in seq_along(ready_for_export)){
write.table(ready_for_export[[i]], paste0("test", i, ".txt"))
}
There are probably packages out there, that will make it look nicer and perform faster, however, I like to stick with base R
as much as possible.
Upvotes: 2
Reputation: 2734
The tidyverse
allows you to write a solution that is more .. tidy;)
Say your data is in variable df
:
library(tidyverse)
df %>%
mutate(set = plyr::round_any(number - 1, 20, floor) %>% as.factor %>% as.numeric) %>%
group_by(set) %>%
mutate(set_num = number %>% as.factor %>% as.numeric) %>%
ungroup ->
df_prep
df_prep$set %>%
unique %>%
walk(~ write_tsv(df_prep %>%
filter(set == .x) %>%
select(number = set_num,
value),
paste0("file-", .x, ".tsv")))
Where the as.factor %>% as.numeric
trick assigns new unique numeric ids to distinct values of the column. The right assignmnent ->
is a bit unusual, but makes the magrittr
pipeline flow nicely.
Upvotes: 1