alienCoder
alienCoder

Reputation: 1481

Best way to store 4.7 million binary files

I have parsed the whole english wikipedia and saved each parsed article in a separate protocol buffer file.Each file has a unique id (wikiid). I have now of 4.7 million parsed articles total size of 180 gb. I know ext4 can handle this amount of files but is it a good practice? or should I use database? I will not need to update it frequently.

Upvotes: 1

Views: 538

Answers (1)

abhi-rao
abhi-rao

Reputation: 2785

Keep it as files - db is relatively more expensive to scale and maintain. Though you may want to be careful in how you name/store them -instead one directory having all the 4.7M files - have a directory structure that goes to say 4 levels. Preprocess the 4.7 M files to store in a directory structure. Say id of a file D1D2D3d4fewmorechars.txt - so now store this file in /D1/D2/D3/D4/D1D2D3D4fewmorechars.txt.

Or the other option is use file systems such as XFS, ext3/4 - that use directory indexing techniques such as hashed directories. Check this link - https://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory

Upvotes: 2

Related Questions