Ash_23S
Ash_23S

Reputation: 115

R script to export in batches

I posted a similar question before but my question has now changed so thought I'd post again. I have a data table with two columns; number and value, as follows;

number  value
1   test1
1   test2
1   test3
2   test4
2   test5
3   test6
3   test7
3   test8
4   test9
5   test10
6   test11
7   test12
8   test13
9   test14
10  test15
11  test16
12  test17
13  test18
14  test19
15  test20
16  test21
17  test22
18  test23
19  test24
20  test25
21  test26
22  test27
23  test28

I would like to export the data table as multiple .txt files. The first text file should contain a subset of the entire data table where number is between 1-20. The second text file should contain a subset of the data where the number is between 21-40, the third where the number is between 41-60 and so on. The data table is dynamic, so the number of .txt files exported will vary.

Furthermore, in all .txt files, the 'number' must be between 1-20. So if the number is 21, it must be renamed to 1, if the number is 22, it must be renamed to 2, etc.

Is anybody able to help? In the example above, there should be 2 .txt files, the first with 25 rows and the second with 3 rows, and the second .txt file containing numbers 21,22,23 renamed to 1,2,3.

Upvotes: 0

Views: 412

Answers (2)

KenHBS
KenHBS

Reputation: 7174

Firstly, I would split your data frame into chunks of 20 rows using split(). This function will split your data frame according to some criterion. In your case, this criterion could be something like: "what is the outcome of the row number divided by 20 (rounded to up/down to an integer)?". According to this rule, the input data will be split.

nrows <- 1:nrow(df)
df    <- split(df, floor(nrows/20))

Edit: If you want to split according to the value in df$number, you should use df <- split(df, floor((df$number-1)/20))


Secondly, you must somehow transform deduct multiples of 20 for all number higher than 20. I would have used modulo %% 20, but that also transform 20 to zero.

ready_for_export <- lapply(df, function(x){
                       x$number <- (x$number - floor((x$number-1)/20)*20)
                       return(x)})

Finally, save the element in the list ready_for_export in separate txt documents. I'd use a for-loop for this:

for(i in seq_along(ready_for_export)){
   write.table(ready_for_export[[i]], paste0("test", i, ".txt"))
}

There are probably packages out there, that will make it look nicer and perform faster, however, I like to stick with base R as much as possible.

Upvotes: 2

liborm
liborm

Reputation: 2734

The tidyverse allows you to write a solution that is more .. tidy;)

Say your data is in variable df:

library(tidyverse)

df %>%
  mutate(set = plyr::round_any(number - 1, 20, floor) %>% as.factor %>% as.numeric) %>% 
  group_by(set) %>%
  mutate(set_num = number %>% as.factor %>% as.numeric) %>%
  ungroup ->
  df_prep

df_prep$set %>%
  unique %>%
  walk(~ write_tsv(df_prep %>% 
                     filter(set == .x) %>%
                     select(number = set_num,
                            value),
                   paste0("file-", .x, ".tsv")))

Where the as.factor %>% as.numeric trick assigns new unique numeric ids to distinct values of the column. The right assignmnent -> is a bit unusual, but makes the magrittr pipeline flow nicely.

Upvotes: 1

Related Questions