Trying to split a very large file into multiple smaller files based on the contents of each record (perl/linux)

Question

Here is the problem.

I have 20 very large files, each approx 10gb, and I need to split each of the bulk files by A) criteria within the record and B) what type of bulk file it is.

Example.

Each bulk file represents an occupation. We have Lawyers, Doctors, Teachers and Programmers. Each of these bulk files contain millions of records for different individuals, not a lot of individuals, say 40 different people in total.

A record in the doctor file may look like

XJOHN 1234567   LOREMIPSUMBLABLABLA789

I would need this record from the file to be output into a file called JOHN.DOCTOR.7

John is the persons name, 7 is the last digit in the numeric sequence, and DOCTOR was the file type. I need to do this for file size limitations. Currently, I'm using perl to read the bulk files line by line and print the record into the appropriate output file. I'm opening a new handler for each record to avoid having multiple threads writing to the same handler and causing data malformations. I do have the program threaded, one thread per bulk file. I cannot install any third party applications, assume I only have whatever comes standard with RedHat Linux. I'm looking for either a Linux command that has a more efficient way of doing this or perhaps a better way that perl offers.

Thanks!

Trying to split a very large file into multiple smaller files based on the contents of each record (perl/linux)

Answers (1)

Related Questions