This post contains a proposed explanation for behavior observed in AWS where the most recent snapshot of an EBS volume is all that is required to be able to restore an entire volume. The motivation for writing this post stems from an internal discussion on EBS snapshot behavior held between myself and colleague Dan Rivera. Feedback from readers on the validity (or lack thereof) of the information contained in this post is not only welcomed, but also encouraged.
First, What is EBS?
EBS, short for Elastic Block Storage is an Amazon Web Services storage service which allows assignment of persistent, block level storage devices to Amazon Web Services instances. After attachment, the EBS volume behaves like a typical local drive, which allows for formatting and installing applications. This volume can also be detached and assigned to other instances as well (only one instance at any given time in a single availability zone however).
From a resiliency perspective, one of the challenges with EBS volumes is that they do not replicate outside of the availability zone from which they are created. This would mean that any data on that volume is isolated to a single availability zone, which from a resiliency perspective is less than ideal.
One means to increase data resiliency on EBS volumes is to regularly perform snapshots on them. Snapshots provide the ability to capture a point in time backup of the volume which can be used to restore that volume in the event of a disaster. One highly beneficial component of snapshotting EBS volumes is that those snapshots are automatically stored in S3. By storing these snapshots in S3, the data on that volume is replicated throughout all of the Availability Zones contained within the region the snapshot was created. This improves the resiliency of EBS stored data significantly by not limiting it to a single Availability Zone.
The way snapshots in AWS work is that the first snapshot of a volume becomes a full snapshot. Any successive snapshots are incremental. In other words, aside from the first snapshot, only changed blocks on that storage volume are captured in successive snapshots.
Let’s expose this a bit more through an example scenario we’ll imagine is being conducted on an instance launched from an Amazon Linux AMI with a single 8GB EBS-backed root volume1:
1. Snapshot is taken of the root volume directly after the machine is completed with its status checks and is made available for use
Despite us not making any changes to the underlying data on the volume, this first snapshot is a full replica of the entire volume
1. A log file is created in /tmp and contains text that reads, “Cloud is awesome.” The file is saved.
2. Second snapshot is taken after this log file is saved.
This second snapshot is incremental, and contains only the blocks that have changed since the first snapshot
1. We edit the log file found in /tmp and replace “Cloud is awesome.” with “Cloud is awesome!” The file is saved. – note the replacement of the period with an exclamation mark
2. Third snapshot is taken after this log file is saved.
This third snapshot is incremental, and contains only the blocks that have changed since the second snapshot
EBS Volume Snapshot Deletion
In the example above It is this third snapshot that piques our curiosity. The reason for this is that according to Amazon Web Services, it is possible to restore the entire volume from only the final snapshot (Snapshot Three).
“Even though snapshots are saved incrementally, the snapshot deletion process is designed so that you need to retain only the most recent snapshot in order to restore the volume.”
To be more clear, if we were to:
1. Delete the instance from which the snapshots were created
2. Delete snapshot 1 (full)
3. Delete snapshot 2 (incremental)
We would be able to restore the entire volume from the only remaining third incremental snapshot which contains nothing more than a simple modification to punctuation in a log file.
How could this be possible?
Let me preface this section with a disclaimer of sorts – the Amazon Web Services platform is designed in such a way that its intricate, inner workings are abstracted away from its users. Therefore, some guess work is required when attempting to explain certain observable behaviors like this which are found in AWS. With that out of the way, here begins my guess work =)
Quite simply, what could be making this phenomenon possible is that a merge of the blocks in these snapshots is occurring, which makes this somewhat of a consolidation as well as a delete operation. Also, deletion of the first snapshot (full), vs. the successive snapshots should invoke totally different behaviors.
From the AWS CLI reference for the ‘delete-snapshot’ command:
“When you delete a snapshot, only the data not needed for any other snapshot is removed. So regardless of which prior snapshots have been deleted, all active snapshots will have access to all the information needed to restore the volume.”
Considering when issuing a delete of a snapshot in Amazon Web Services that is not the first, (full) snapshot, the blocks of the snapshot being targeted for deletion are probably compared with the next most recent snapshot in the chain. The blocks which are found to be more recent in the newer snapshot win. The blocks that are unique to that one snapshot being targeted do not get consolidated and instead are not considered for consolidation. Hence, the entire snapshot that is targeted is effectively deleted.
Considering when issuing a delete of a snapshot in Amazon Web Services that is the first (full) snapshot, the blocks are probably consolidated with the next most recent snapshot. By deleting all of the oldest snapshots and leaving the most recent one untouched, what has happened is that the data from all snapshots has been rolled up to the last remaining, most recent snapshot. This allows for the entire volume to be fully recoverable only from that single, remaining snapshot.
Both of these considerations could be tested by observing the snapshot file sizes in S3, however that information is not available for viewing.
Another explanation could be that the first snapshot is never truly deleted until the most recent snapshot is removed. The first snapshot is the key, as when merged with an administrator’s most recent snapshot, allows them to recover an entire volume as it appeared when that most recent snapshot request was submitted. This explanation seems less plausible when compared to explanation #1, however it is a possibility.
It appears that at any given time, S3 contains all data needed to restore an instance to the state in which it appeared during the point a particular snapshot request was issued. This is true even if the final snapshot contains only a very insignificant difference in its block changes when compared to its antecedent snapshot. Considering that the data is being “rolled up” or consolidated from each snapshot you attempt to delete in the chain2, your volume can truly be restored from the most recent snapshot performed, regardless of how insignificant the update.
What do you think? Please feel free to share your thoughts in the comments section below.
1. Root Volume – System volume, typically containing the installation of the operating system↩
2. Assuming your delete requests occur from oldest to newest. Newest to oldest would not take this process into consideration I imagine↩