GaneshK
GaneshK

Reputation: 39

Alfresco db content size and contentstore size: How to match?

I have an SQL query to get the content size from a content URL, but when I execute the SQL and compare the result to the directory size of contentstore it doesn't match. For example first I get the content size from database (PostgreSQL):

SELECT count(*),
       SUM(round(u.content_size/1024/1024,2)) AS "Size (MB)",
       to_char(n.audit_created:: date,'yyyy') AS yyyy_
  FROM alf_node            AS n,
       alf_node_properties AS p,
       alf_node_properties AS p1,
       alf_namespace       AS ns,
       alf_qname           AS q,
       alf_content_data    AS d,
       alf_content_url     AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
GROUP BY yyyy_
ORDER BY yyyy_ DESC;

It gives me a total size of 600 GB. But when I see the size property for folder content store in alf_data it shows 2.4 TB. How is that possible?

Upvotes: 4

Views: 2192

Answers (2)

Marco Altieri
Marco Altieri

Reputation: 3818

It is not so easy to know the space of the contentstore based only on what you currently have in your repository.

When a document is deleted or updated with a new version, Alfresco does not immediately delete the binary from the disk.

The process to actually free the disk space is a bit more "complex". How to configure the first step of this process is described in the following article: Cleaning up orphaned content (purge)

The "purge" by default is configured to be executed every day at 4AM and will purge any orphaned file that has been removed from the repository at least 14 days earlier.

As described in the article, by default, Alfresco does not delete the purged files but it only moves them in a different folder (contentstore.deleted).

It is safe to delete the files in the contentstore.deleted after a backup.

As usual, in alfresco-global-properties, you can modify the default configuration (that you can find in alfresco/repository.properties)

You can end up with a lot of orphaned files in many ways. Just to give some examples:

  1. the purge is not working for some reason
  2. someone recently has imported and then deleted many and/or big files
  3. someone has used the alfresco export, that creates the zip file in the repository
  4. the system is used to store short-living documents that are often deleted or updated (like big videos), etc...

Upvotes: 6

Younes Regaieg
Younes Regaieg

Reputation: 4156

  • First, if you are on windows, you should be aware of the difference between size and size on disk
  • You should be aware of the difference between MagaByte (MB), MebiByte (MiB) and MegaBit (Mbit)
  • And last, but not least, you're rounding up file sized in MB, for each file before summing that up, try summing in bytes then rounding the results like this:

    SELECT count(*),round(SUM(u.content_size)/1024/1024, 3) AS "Size (MB)", to_char(n.audit_created:: date,'yyyy') as yyyy_
    FROM alf_node AS n,
    alf_node_properties AS p,
    alf_node_properties AS p1,
    alf_namespace AS ns,
    alf_qname AS q,
    alf_content_data AS d,
    alf_content_url AS u
    WHERE n.id=p.node_id
    AND ns.id=q.ns_id
    AND p.qname_id=q.id
    AND p.long_value=d.id
    AND d.content_url_id=u.id
    AND p1.node_id=n.id
    AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
    group by  yyyy_
    ORDER BY yyyy_ DESC;
    

Upvotes: 5

Related Questions