Filesystem-backed data structure?

Question

Imagine a data structure like so:

public class Cat
{
    public string Name;
    public string FavoriteFood;
    public List Memories;
}

public class Memory
{
    public string Name;
    public DateTime Date;
    public List Thoughts;
}

Sometimes, Cats will have many Memories, each with many thoughts. This can take an extraordinary amount of space, so keeping it in memory might not be the best idea. What's the best way to back this data with files and folders?

This would be nice not only for memory efficiency but also rather convenient for human usability if someone wanted to take a look at the data. An ideal filesystem might look like this.

\---Cats
    +---Charles
    |   |   cat.json
    |   |
    |   \---Memories
    |       |   eating_food.json
    |       |   sleeping.json
    |       |   biting_some_dude.json
    |
    \---Brumpbo
        |   cat.json
        |
        \---Memories
            |   sleeping.json
            |   sleeping_again.json

cat.json files might look something like this:

{
    "name": "Charles",
    "favorite_food": "pant",
    "memories": [
        "eating_food",
        "sleeping",
        "biting_some_dude"
    ]
}

Memory files might look something like this (note that thoughts could be extremely long):

{
    "name": "eating_food",
    "date": "2009-01-20T12:00:00.000Z",
    "thoughts": [
        "God, I love pant.",
        "This is some great pant.",
        // ...
        "I am never going to eat ever again.",
        "This was a mistake."
    ]
}

My first attempt at implementing this was to use IDisposable for serialization.

public class Cat : IDisposable
{
    public string Name;
    public string FavoriteFood;
    public List Memories;

    // Load a cat if it already exists, or create a new one.
    public Cat(string name)
    {
        if (Storage.DirectoryExists(name))
        {
            var info = Storage.ReadFile($"{name}/cat.json");
            this.Name = info.Name;
            this.FavoriteFood = info.FavoriteFood;
            this.Memories = info.Memories;
        }
        else
        {
            this.Memories = new List();
        }
    }

    public Memory GetMemory(string name)
    {
        if (this.Memories.Contains(name))
        {
            return new Memory(this, name);
        }
        return null;
    }

    // Serialize and store the cat.
    public void Dispose()
    {
        var info = new CatInfo
        {
            Name = this.Name,
            FavoriteFood = this.FavoriteFood,
            Memories = this.Memories
        };
        Storage.WriteFile("${this.Name}/cat.json", info);
    }
}

public Memory : IDisposable
{
    private readonly Cat cat;

    public string Name;
    public DateTime Date;
    public List Thoughts;

    public Memory(Cat cat, string name)
    {
        if (Storage.FileExists($"{cat.Name}/Memories/{name}.json"))
        {
            var info = Storage.ReadFile($"{cat.Name}/Memories/{name}.json");
            this.Name = info.Name;
            this.Date = info.Date;
            this.Thoughts = info.Thoughts;
        }
        else
        {
            this.Thoughts = new List();
        }
    }

    public void Dispose()
    {
        var info = new MemoryInfo
        {
            Name = this.Name,
            Date = this.Date,
            Thoughts = this.Thoughts
        };
        Storage.WriteFile($"{this.cat.Name}/Memories/{this.Name}.json", info);
    }
}

Terrible as this may be, it works quite well until one issue comes up: thread safety. Imagine this: Charles the Cat discovers he likes to eat "bread" more than he likes to eat "pant." Now this necessitates two changes; one to the Cat.FavoriteFood field, and an addition to Cat.Memories. However, these two changes are likely handled by two separate processes in the application. This could result in a loss of data.

Thread 1: Charles is loaded to update FavoriteFood.
Thread 2: Charles is loaded to update Memories.
Thread 1: Charles's FavoriteFood is updated to "bread."
Thread 2: Charles's Memories is updated to include "eating_bread."
Thread 1: Charles's data is serialized and written. 
Thread 2: Charles's data is serialized and written.

Because Thread 2 was loaded before Thread 1 serialized Charles's favorite food and written afterwards, the update to FavoriteFood is completely lost.

A solution to this could be to move the read/modify/write operation into a property for every field, but this seems incredibly inefficient, especially when considering a hypothetical data type with dozens of properties.

To be clear, the goal here is a thread-safe method for storing data on the disk in a human-accessible manner; this doesn't necessarily mean using JSON or even text files. What's the best solution here?

Filesystem-backed data structure?

Answers (1)

Related Questions