Jason Chen
Jason Chen

Reputation: 11

Get UNIX directory tree structure into JSON object

I'm trying to build a browser application that visualizes file structures, so I want to print the file structure into a JSON object.

I've tried using many variation of 'ls' piped to sed, but it seems like find works best.

Right now I'm just trying to use the command

find ~ -maxdepth ? -name ? -type d -print

And tokenize the path variables

I've tried just simple ajax with PHP-exec this, but the array-walking is really slow. I was thinking to do it straight from bash script, but I can't figure out how to get the pass-by-reference for associative arrays to recursively add all the tokenized path variables to the tree.

Is there a better or established way to do this?

Thanks!

Upvotes: 1

Views: 2705

Answers (3)

Cristian
Cristian

Reputation: 733

9 years later... Using tree should do the job.

tree ~ -J -L ? -P '?' -d --noreport

where:

  • -J output as json
  • -L max level depth (equiv to find -maxdepth)
  • -P pattern to include (equiv. to find -name)
  • -d directories only (equiv. to find -type d)

Upvotes: 6

ssokolow
ssokolow

Reputation: 15355

Walking the disk is always going to be slower than ideal, simply because of all the seeking that needs to be done. If that's not a problem for you, my advice would be to work to eliminate overhead... starting with minimizing the number of fork() calls. Then you can just cache the result for however long you feel is appropriate.

Since you've already mentioned PHP, my suggestion is to write your entire server-side system in PHP and use the DirectoryIterator or RecursiveDirectoryIterator classes. Here's an SO answer for something similar to what you're asking for implemented using the former.

If disk I/O overhead is a problem, my advice is to implement a system along the lines of mlocate which caches the directory listing along with the directory ctimes and uses stat() to compare ctimes and only re-read directories whose contents have changed.

I don't do much filesystem work in PHP, but, if it'd help, I can offer you a Python implementation of the basic mlocate-style updatedb process. (I use it to index files which have to be restored from DVD+R manually if my drive ever fails because they're too big to fit on my rdiff-backup target drive comfortably)

Upvotes: 0

Nathan Smith
Nathan Smith

Reputation: 321

I don't know what your application's requirements are, but one solution that solves your problem (and a number of other problems) is to hide the actual file system layout behind an abstraction layer.

Essentially, you write two threads. The first scrapes the file structures and creates a database representation of their contents. The second responds to browser requests, queries the database created by the first thread, and generates your JSON (i.e. a normal web request handler thread).

By abstracting the underlying storage structure (the file system), you create a layer that can add concurrency, deal with IO errors, etc. When someone changes a file within the structure, it's not visible to web clients until the "scraper" thread detects the change and updates the database. However, because web requests are not tied to reading the underlying file structure and merely query a database, response time should be fast.

HTH, nate.

Upvotes: 2

Related Questions