Reputation: 11562
Would it be possible to use Git as a hierarchical text database?
Obviously you would have to write a front end that would act as a middle man, translating user commands into git commands.
A record would correspond to a "file". In the "file", the text would have to have some kind of conventional format like:
[name]: John Doe
[address]: 13 Maple Street
[city]: Plainview
To do queries, you would have to write a grep front end to use git's search capability.
The database itself would be the repository.
The directory structure would be the hierarchical structure of the database.
The tricky part I see would be that you want the records to be in memory, not usually files on the drive (although that would be possible). So you would have to configure git to be working with files in a virtual file system that was actually in the memory of the db middleware.
Kind of a crazy idea, but would it work?
Potential Advantages:
Upvotes: 3
Views: 1631
Reputation: 165546
Yes, but it would be very slow and it wouldn't involve git. The functionality of git grep
and git clone
are available without git
.
Filesystems can be used as certain types of databases. In fact, git
itself uses the filesystem as a simple, reliable, fast, robust key/value store. Object 4fbb4749a2289a3cd949ebe08255266befd18f23
is in .git/objects/4f/bb4749a2289a3cd949ebe08255266befd18f23
. Where the master
branch is pointing at is located in .git/refs/heads/master
.
What filesystem databases are very bad at is searching the contents of those files. Without indexing, you have to look at every file every time. You can use basic Unix file utilities like find
and grep
for it.
In addition, you'd have to parse the contents of the files each search which can be expensive and complicated.
Concurrency becomes a serious issue. If multiple processes want to work on a change at the same time they have to copy the whole repository and working directory, very expensive. Then they need to do a remote merge, also expensive, which may result in a conflict. Remote access has the same problem.
As to having the files in memory, your operating system will take care of this for you. It will keep frequently accessed files in memory.
Addressing the specific points...
all the records would be hashed with SHA-1 so there would be high integrity
This only tells you that a file is different, or that someone has tampered with the history. In a database files are supposed to change. It doesn't tell you if the content is corrupted or malformed or it's a normal change.
git takes care of all the persistence problems
Not sure what that means.
db operations like edits can be managed as git merges
They're files, edit them. I don't know how merging gets involved.
Merging means conflicts which means human intervention, not something you want in a database.
db operations like record deletes can be managed as removals (rm)
If each single file is a record, yes, but you can do the same thing without git.
all changes to the database are stored, so you can recover ANY change or previous state
This is an advantage, it sort of gives you transactions, but it will also make writing to your database supremely slow. Git is not meant to be committing hundreds of times a second.
making copies of the database can be done with clone
cp -r
does the same thing.
In short, unless you're doing a very simple key/value store there is very little advantage to using a filesystem as a database. Something like SQLite or Berkeley DB are superior in almost every way.
Upvotes: 2