knowone
knowone

Reputation: 840

Merge multiple files based on size: Limit resultant file size as well

Merging multiple files in one single file isn't an issue in unix. I however wanted to combine multiple files into fewer files and limit the formation of these multiple files based on size.

Here's full explanation: 1) There are 200 files of varying sizes ranging from 1KB to 2 GB. 2) I wish to combine multiple files at random and create multiple files of 5 GB each. 3) So if there are 200 files ranging from 1KB to 2GB per file, the resultant set might be 10 files of 5GB each.

Below is the approach I'm trying to make but couldn't devise the logic, need some assistance:

for i in ls /tempDir/`` do if [[ -r $i ]] then for files in find /tempDir/ -size +2G`` cat $files > combinedFile.csv fi done

This will only create one file combinedFile.csv whatever the size may be. But I need to limit the size of combinedFile.csv to 5GB and create multiple files combinedFile_1.csv combinedFile_2.csv, etc.

Also, I would also like to check that when these multiple merged files are created, the rows aren't broken in multiple files.

Any ideas how to achieve it?

Upvotes: 1

Views: 843

Answers (1)

knowone
knowone

Reputation: 840

I managed a workaround with cating then splitting the files with the code below:

for files in `find ${dir}/ -size +0c -type f`
do
        if [[ -r $files ]]
        then
                cat $files >> ${workingDirTemp}/${fileName}
        else
                echo "Corrupt Files"
                exit 1
        fi
done

cd ${workingDir}
split --line-bytes=${finalFileSize} ${fileName} --numeric-suffixes -e --additional-suffix=.csv ${unserInputFileName}_

cat is a CPU intensive operation for big files like 10+Gigs. Does anyone have any solution that could reduce the CPU load or increase the processing speed?

Upvotes: 1

Related Questions