Marcin
Marcin

Reputation: 329

Why file in JFFS2 gets corrupted after power loss during removal of other files

I am working with Linux (3.4.31+) embedded system booting from JFFS2 partition. I frequently encounter a problem of a file corruption when power loss occurs while other files are being removed. It happens during upgrade procedure of the platform. These are simplified steps of the upgrade:

  1. Download tar.gz containing (among other files) rootfs.squashfs image of file system that I am upgrading to, verify md5 checksum of the image.
  2. Boot linux from a small JFFS2 partition that has a minimal set of tools required to perform upgrade.
  3. Mount the large partition that must be upgraded.
  4. Mount rootfs.squashfs which is stored in the big partition.
  5. Remove all files from the large partition except from some migrated data files, the rootfs.squashfs image etc.
  6. Copy all files from mounted rootfs.squashfs to the large partition
  7. Boot from the large partition

The mentioned power loss occurs in 5. step. Note that the rootfs.squashfs is mounted as read only and is never altered during upgrade. Even though this file gets corrupted and after device is powered on you can see that the file's md5 checksum is different, size stays unchanged, image can be mounted but it is impossible to read some of the files from this image.

Why this file gets ocrrupted? Shouldn't JFFS2 deal with this kind of scenario? Is there any way to recover from this situation?

Upvotes: 2

Views: 2350

Answers (1)

minghua
minghua

Reputation: 6581

A while ago I did see corruptions to the file that is open and being written to. Waiting longer than fs commit time (5 seconds by default) solved the problem. That means in your step 1 after extracting all the files from tar.gz a sleep of 7 seconds will allow the fs to settle down and get flushed to the flash. If that works for you, let us know.

A partition small enough so that the gc collects too often or early could erase previous logs prematurely. That subsequently could cause rollback too shallow thus files could end up at a corrupted state. This is my reading of jffs2 algorithms, not verified with experts or in practice yet.

Given these views, after touching files (write, delete), a sleep of 7 seconds would be needed.

Maybe two sets of same files are required. Each set will be written to apart from the previous set by a time interval longer than the commit time, e.g. 7 seconds. After the power up, determine which set is still valid and use the valid set.

There has been very little information about jffs. Some of my views are just guess, and some of the guesses were supported by testing in limited conditions. Thus I cannot guarantee the views are right. When I sift through the kernel commits in jffs area, it's obvious that it is very hard to track which version has what bugs, and when those bugs were fixed. Maybe if you try a different version the problem will be different.

Upvotes: 1

Related Questions