Reputation: 4055
I'm trying to construct a custom artisan command which cleans up my filesystem each day, there are a few steps to it and I cant figure out a way to do it without crashing the memory. There are thousands of folders and hundreds of thousands of users.
What i need to do is;
$user->isCompleted()
In my filesystem i have stored data for each user id based on day, so the filesystem looks like so;
data/2022-01-15/7482947
data/2022-01-15/7482946
data/2022-01-15/7482945
data/2022-01-16/2353234
data/2022-01-16/2353233
data/2022-01-16/2353232
The format is;
data/<date>/<user_id>
So far I have managed to return the folders which are older than 50 days, using the code below, but iam unsure of how to continue getting the nested folder and then querying the DB
collect(Storage::directories('data/'))
->filter(function ($directory) {
$directoryDate = Str::after($directory, '/');
if (! Carbon::parse($directoryDate)->lte(now()->subDays(50))) {
return false;
}
return true;
});
Any help would be greatly appreciated.
Upvotes: 2
Views: 264
Reputation: 703
get all the folder's names older than 50 days
get all the user ids from the folder name
check-in DB whether the user is completed or not
delete only completed users' folder
collect(Storage::directories('data/'))
->filter(function ($directory) {
$directoryDate = Str::after($directory, '/');
// check directory is older than 50 days
if (Carbon::parse($directoryDate)->lte(now()->subDays(50))) {
// get all user ids from directoryDate folder
$userids = collect(Storage::directories('data/'.$directoryDate))
->map(function ($userDirectory) use ($directoryDate) {
return Str::after($userDirectory, 'data/'.$directoryDate.'/');
});
// check whether the user is completed or not. return only completed user ids onlys
// instead using one query for one folder we are doing it in single query. it will reduce read operation in db
$complete_duser_ids = User::where('is_complete',1)->wherein('id', $userids->all())->get()->pluck('id')->toArray();
// delete the user folder who completed
foreach ($complete_duser_ids as $completed_user) {
Storage::deleteDirectory('data/'.$directoryDate.'/'.$completed_user);
}
}
});
we are fetching all the user folder ids and then we are checking in DB. the query will return only the completed user once we get that we are deleting it. you can use this logic on scheduled jobs.
Upvotes: 1
Reputation: 751
Something to think about.
Whenever I run into a situation where you are manipulating big data sets or memory-heavy stuff, I'll always try to split the heaviest logic into jobs.
So, in this case, ill make a top-level command that schedules multiple jobs that checks the user and queries the DB.
This gives you the ability to scale, and when using something like laravel horizon, it gives you the option to run multiple tasks simultaneously!
When making the job ill want to make sure that each job is a unique instance of the logic that I am running, so ill use the shouldBeUnique trait found on the job and pass the user id or the subdirectory to the unique() function.
Hopefully, this gives you some ideas!
Upvotes: 0