dan decaupian
dan decaupian

Reputation: 1

ManifoldCF and Postgresql to crawl 1.5 Million of documents

We used ManifoldCF with Postgresql (9.6) to crawl our websites. The speed of the crawling is good (approximately 20.000docs/hours) until 500.000 docs. after the performance decrease, and we can see long freeze (very long) of the crawling. We suspect that postgresql rebuild the indexes of the intrinsiclink table. Is it possible to forbidden this ? by settings of postgresql ?

Thank you Dan

Upvotes: 0

Views: 159

Answers (1)

J Zou
J Zou

Reputation: 108

What MCF version you are using? try the latest version: 2.13

Most time the Database is dragging the performance. Better tuning the PG will get better results

According to MCF guide: https://manifoldcf.apache.org/release/release-2.13/en_US/performance-tuning.html

You should Turn off PG autovacuuming, see if that help.

There's many other factors in the tuning to try.

Upvotes: 0

Related Questions