Ashish
Ashish

Reputation: 8529

Efficient search algorithm for files with specifed extension C/C++

I want to implement faster directory search. Is there any algorithm in c/c++ is available for that ?

Upvotes: 2

Views: 2841

Answers (4)

Searching is an OS feature these days, and those who are trying to implement third party indexing are giving up. Even Google Desktop is not being updated and most consider it dead:

https://superuser.com/questions/194082/is-google-desktop-search-a-dead-project

If you install a search server on someone's computer and get caught hogging disk and CPU--and you do not have a very, VERY good reason for doing so--you will not only waste a lot of time writing code and patching bugs but you will also alienate your users.

For most cross-platform apps, letting the users find the files in the Explorer/Finder/Nautilus and then making your app accept multi-file drag and drops is a better answer. Also, most "common dialogs" for opening files provide built in search functionality now.

If you're trying to write a search-accelerated tool for a specific platform, hook into that platform's API, which may even permit you to supplement its index. Here's Microsoft's Programmatic Search API:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb266517(v=vs.85).aspx

OS/X has the spotlight API:

http://developer.apple.com/library/mac/#documentation/Carbon/Conceptual/SpotlightQuery/SpotlightQuery.html

I'm not quite sure if there's "canon" for search in the Linux world, but most all of the relevant versions of Ubuntu now ship with Tracker:

http://live.gnome.org/Tracker/Documentation

Upvotes: 0

Mooing Duck
Mooing Duck

Reputation: 66922

Here's a windows solution (http://ideone.com/5dFVf)

class file_iterator : std::iterator<std::output_iterator_tag, const WIN32_FIND_DATA> {
    HANDLE handle;
    WIN32_FIND_DATA fdata;
public:
    file_iterator() :handle(NULL) { 
#ifdef _DEBUG
        memset(&fdata, 0, sizeof(fdata); 
#endif //_DEBUG
    }
    file_iterator(const std::wstring& path) :handle(FindFirstFile(path.c_str(), &fdata)) {}
    file_iterator(file_iterator&& b) :handle(b.handle) {b.handle = NULL;}
    file_iterator& operator=(file_iterator&& b) {close(); handle = b.handle; b.handle = NULL;}

    void close() {
        if (handle) 
            FindClose(handle); 
#ifdef _DEBUG
        memset(&fdata, 0, sizeof(fdata); 
#endif //_DEBUG
    }

    const WIN32_FIND_DATA& operator*() {return fdata;}
    file_iterator& operator++() {if (FindNextFile(handle , &fdata)==false) close(); return *this;}
    bool operator==(const file_iterator& b) {return handle == b.handle;}
    bool operator!=(const file_iterator& b) {return handle != b.handle;}
};

std::vector<std::wstring> 
    find_files_with_extension(
        const std::wstring& folder, 
        const std::wstring& extension, 
        std::vector<std::wstring>* result=NULL) 
{
    std::wstring filepath = folder + L"/*";
    std::vector<std::wstring> local_result;
    std::deque<std::wstring> todo;
    if (result == NULL) 
        result = &local_result;
    file_iterator iter(filepath);
    while(iter != file_iterator()) {
       std::wstring folder_file((*iter).cFileName);
       if ((*iter).dwFileAttributes | FILE_ATTRIBUTE_DIRECTORY)
           todo.push_back(folder_file);
       else if (folder_file.size() > extension.size() && folder_file.substr(folder_file.size()-extension.size())==extension)
           result->push_back(folder_file);
       ++iter;
    }
    for(int i=0; i<todo.size(); ++i)
        find_files_with_extension(todo[i], extension, result);
    return *result;
}

This uses a breadth-first search, which takes a little more RAM and is slightly more complicated, but faster due to caching.

Upvotes: 1

frankc
frankc

Reputation: 11473

There isn't a C++ thing per se, but usually directory search is slow because of IO, and because you must stat each file (or whatever the OS equivalent is non-unix systems) to find out anything besides its name. One way to make this faster would be to keep write a server that keeps the inodes and filenames in memory. Of course the difficulty is that the inode information is not static. You would need to listen to file system changes to keep your cache up to date. That is definitely possible in linux, but I have no experience with it on other systems. As you can see, another theme of this problem is that it is very system and possibly filesystem dependent. Maybe a system-independent library like Boost::Filesystem can help, but I doubt it implements directory update callbacks.

Maybe just install Google Desktop?

Upvotes: 1

Tio Pepe
Tio Pepe

Reputation: 3089

Check boost::filesystem library on http://www.boost.org/doc/libs/1_47_0/libs/filesystem/v3/doc/index.htm, there you have a recursive_directory_iterator class.

Upvotes: 4

Related Questions