user5989333
user5989333

Reputation:

Repeat values in fixed number of rows in dataframe in R

I have a dataframe DF1 consisting of 168 file names:

DF1$FileName <- c("File1.csv", "File2.csv",..... "File168.csv")

Using:

filez <- NULL
for (i in 1:168){
     filez[i] <- paste0("file", i, ".csv", sep="")
}
filesz <- as.data.frame(filez)

I have another dataframe DF2 as follows:

DF2$RowNumber <- as.data.frame(rep(c(1:512000), times = 168, length.out = NA, each = 1))

This means DF2 has a column "RowNumber" in which there are 168 times repetition of numbers 1 through 512000 (i.e. total 86016000 rows).

What I want to do is to:

  1. Select a file name (one at a time) -> DF1$FileName[i]

  2. And repeat paste it 1 to 512000 in DF2$FileName

  3. Repeat the above untill all 86016000 rows have filled in

The end result should look like:

DF2
RowNumber     FileName
1             File1.txt    
2             File1.txt
3             File1.txt
.             .
.             .
.             .
.             .
512000        File1.txt
1             File2.txt
2             File2.txt
3             File2.txt
.             .
.             .
512000        File2.txt
1             File3.txt
2             File3.txt
3             File3.txt
.             .
.             .
512000        File3.txt
.             .
.             .
512000        File167.txt
1             File168.txt
2             File168.txt
3             File168.txt
.             .
.             .
512000        File168.txt

I tried this, but I know there is logical mistake leading to system hanged up:

for (i in 1:nrow(m)){
    while(m$RowNumber[i] != 512000) {m$FileName[i] <- filez[[i]]}
}

Can someone please suggest me better and easy way to resolve my issue?

I am sure R would have some package to perform such operations, but I don't know which one.

Upvotes: 2

Views: 398

Answers (2)

Jaap
Jaap

Reputation: 83275

There is no need for a for loop in this case. You can use specifically designed functions for that, like:

1) expand.grid from base R:

filenames <- paste0("file", 1:168, ".csv")
rownumbers <- 1:512000

d <- expand.grid(rownumbers = rownumbers, filenames = filenames)

which gives:

> head(d)
  rownumbers filenames
1          1 file1.csv
2          2 file1.csv
3          3 file1.csv
4          4 file1.csv
5          5 file1.csv
6          6 file1.csv

2) The CJ (cross join) function from the data.table package:

library(data.table)
d <- CJ(rownumbers = rownumbers, filenames = filenames)

which will give you the same result.

3) The crossing function from the tidyr package:

library(tidyr)
d <- crossing(rownumbers = rownumbers, filenames = filenames)

which will also give you the same result.

Upvotes: 1

David L.
David L.

Reputation: 86

The simplest way to do this would be with integer division so:

for(i in 1:nrow(m)) {
    filenum = 1+floor((i-1)/512000)
    filename = paste0("File",filenum,".txt")
    ## instead of : m$FileName[i]=filenum , use:
    m$FileName[i] = filename  ## it works!
}

Hope this helps

Upvotes: 1

Related Questions