Dmitry Serebrennikov
Dmitry Serebrennikov

Reputation: 139

advice for choosing linux filesystem for neo4j

I'm looking for advice for choosing and configuring Linux filesystem for storing neo4j database. Of course one should always test with ones own workload, but in general, is there any advice regarding which filesystem would perform best?

Based on this page http://grokbase.com/t/gg/neo4j/131grvg09k/best-filesystem-for-new-neo4j-persistant-storage, and if I understand neo's write patterns correctly, xfs would be preferred.

I've also read the two Linux-specific pages from the manual, but neither seems to give guidance for choosing the filesystem.

If there is a difference in choosing filesystem for HDD vs SSD, please mention your thoughts on both. If there is a special consideration for EC2 / EBS, I would also love to know, as this is where I'm running currently.

If it matters, here's information on the type of graph and workload I'm planning to house: * product catalog-style graph, with 100s of millions of nodes with large and small properties, and billions of relationships. * main use - traversals of 100-500K node subsets to answer queries (desired sub-second response) * periodic updates of 100-500K nodes via bulk uploads (20-30 minutes is ok for this)

Thanks so much!

Upvotes: 0

Views: 627

Answers (2)

BraveNewCurrency
BraveNewCurrency

Reputation: 13065

I'm not sure of the specifics for Neo4j, but MongoDB works much better on XFS. Ext3/4 did not handle allocating sparse files correctly (30s vs 0.1s on XFS), nor deleting many files quickly.

That said, the advice about bench-marking is good. I wouldn't worry about any other Filesystems besides ext4 and XFS until BTRFS is production ready.

Upvotes: 0

Stefan Armbruster
Stefan Armbruster

Reputation: 39915

If you're application is mainly read driven I wouldn't struggle too much choosing the right filesystem. You should focus on choosing the MMIO caches to fit your filestorage. If cache is warmed up, a read operation will not access the IO subsystem.

However when it comes to write operations generally SSD work way much more performant than HDD. Ext4 seems to be the most widely used filesystem for Neo4j. On EC2 you might benefit from using SSD backed instances - however this depends on your amout of write operations.

Generally speaking it's best practice to generate a graph db of approximately the size of the considered production system and run checks beforehand. Premature optimization is mostly a stupid thing.

Upvotes: 1

Related Questions