Using APFS On HDDs … And Why You Might Not Want To

Apple File System

After 16 months of using and testing APFS—Apple’s new file system—I’ve come to the conclusion that you probably don’t want to use it on HDDs (disks with rotating platters).

Why? Well, to understand why APFS and HDDs are not well suited, I first need to explain one of the key features of APFS: “copy on write”. “Copy on write” is the magic behind the snapshot feature in APFS and also allows you to copy really large files in in only a couple of seconds. However, to fully understand the “copy on write” process, and the implications of using APFS with HDDs, it helps first to know how copying works with HFS Extended volumes…

Copying a file on an HFS Extended volume
HFS Extended is the file system Apple has been using for almost 30 years, the one which all Macs running macOS 10.12 or earlier use for their startup volumes.

For my example, I am using a 10 GB movie file, “Nina’s Birthday.mp4”, which is stored in two separate blocks of data on the volume. When I play this movie file on my computer, my Mac will first read the first block and then go straight on to read the second block; it seamlessly moves from one block to the next so that, to the viewer, the movie appears as if it was a single block of data. Files on your Mac can exist in one or many blocks. Small files usually exist in one block whereas larger files are often broken up into 2 or more blocks so they can fit into the available free space in a volume.

Also Read: The Speed of APFS: Just How Fast Is It?

Unlike SSDs, HDDs are mechanical devices with spinning disks (aka platters) containing your volume’s data, and heads that move over the disk in order to read that data. When a HDDs has to go to a new part of a disk, there is a delay while the head moves to the new location and waits for the correct part of the disk platter to be under the head so it can start reading. This delay is usually 4–10 msec (1/250–1/100 of a second). You probably won’t notice a delay when reading a file which is in 2 or 3 blocks, but reading a file which is made up of 1,000 or 10,000 blocks could be painfully slow.

Each of the one or more blocks that make up a single file is called an extent. The file system maintains a table of these extents (one per file) called an extents table. The extents table records the location of every block in the file (the offset) and the length of that block (length). In this way, the computer knows where to go on the disk and how much data to read from that location. For every block of data in a file there is an offset and a length, which together make up a single extent in the extents table. This is the important thing to remember when you go on to read about how APFS deals with files. The “Nina’s Birthday.mp4” file in my example has two extents, the first of which is 2 GB in length and the second of which is 8 GB.

So let’s say I need to make a copy of this file. When I copy the file on my HFS Extended volume, my Mac reads the file’s data, locates a free space in the volume for the copy, and then writes the copied data out. If it can, the Mac will write the new file out as a single block. However, in my example, the volume doesn’t contain a single block of space that is 10 GB in size so it has to write out the file as 2 blocks: the first 4 GB in length and the second 6 GB. Both the original file and the copy can be read relatively quickly because each has only 2 blocks, thus 2 extents.

If I now edit the original movie and add four edits (say transitions between different scenes), when I save the changes, they will be written out over the existing data for this file. Even after the edits, my movie file will still contain only 2 extents and can be read relatively quickly.

Copying a file on an APFS volume
For my example with an APFS volume, I will start with the same movie file, “Nina’s Birthday.mp4,” which is made up of 2 extents, the first 2 GB in length and the second 8 GB.

When I copy this file on an APFS volume, the file data doesn’t actually get copied to a new location on the disk. Instead, the file system knows that both the original and the copy contain the exact same data, so both the original file and its copy point to (reference) the same data. They may look like separate files in the Finder but, under the hood, both filenames point to the same place on the disk. And although the original and the copy each has its own extents table, the extents tables are identical.

This is the magic of copy on write and the reason copying a 100 GB file only takes a few seconds: no data is actually being copied! The file system is just creating a new extents table for the copy (there may be other information it needs to keep track of for the new file, but that’s not important in this example).

I mentioned above that with APFS, an original file and its copy will have identical extents tables. However, this is true only until you make a change to one of them. When I go to create the same 4 transitions in my movie that I created when using my HFS Extended volume, APFS has to find new, unused, space on the disk to store these edits. It can’t write the edits over the original file, like the HFS Extended volume does, because then the changes would exist in both the original file and its copy—remember that the extents table for the file and its copy point to the same location on the disk. So that would be really bad.

Instead, APFS creates a new extent for each of the edits. It also has to create a new extent for the remaining data after the transition, the part of the movie which comes after the transition and which is still the same in both the original movie and its copy. Therefore, for each non-contiguous write, the file system has to create 2 new extents, one for the changed data and one for the original data (common to the original file and its copy), which follows the new data. If this sounds complicated it’s because it is—requiring multiple back-and-forths between the locations of the original file and the files with all the changes. Each back-and-forth is recorded as a new extent.

After writing out my 4 transitions, my original movie file now has 10 extents. This might not seem like a lot of extents but that’s for only 4 edits! Editing an entire movie, or even just retouching a photo could result in thousands of extents. Imagine what happens with the file used to store your mail messages when you are getting hundreds or thousands of messages a week. And if you are someone who uses Parallels or VM Ware Fusion, each time you start up your virtual machine it probably results in 100,000 writes. You can see that any of these types of files could easily get many thousand extents.

Now imagine what will happen when your Mac goes to read a file with a thousand or more extents on an HDD. As the file system reaches the end of one extent and starts reading from the next one, it has to wait the 4–10 msec for the disk’s platter and head to get aligned correctly to begin reading the next extent. Multiply this delay by 1,000 or more and the time taken to read these files could become unbearably long.

This long delay when reading large files is the reason I don’t recommend using APFS on HDDs. This delay will only occur with files which have been written to a lot, and if the file has been copied or the volume has a snapshot. But who wants to use a volume where you have to remember not to copy files or use Time Machine?

I think Apple is aware of this problem as they tell you not to automatically convert startup volumes on HDDs to APFS when upgrading to High Sierra. In addition, when erasing a disk, the Disk Utility application only chooses APFS as the default file system if it can confirm that the disk is an SSD.

The proof: I knew from the start that this was how copy on write was supposed to work, but just to be sure, I wanted to see what was actually going on at the disk level. Since I am the developer for SoftRAID, I can use the SoftRAID driver to allow me to watch what is actually going on.

I created a special version of the SoftRAID driver, which allowed me to record where on a disk the file system was reading and writing data and how much data was transferred each time. I then edited a file on both HFS Extended and APFS volumes.

With a file on an HFS Extended volume, I could see the original data being overwritten in the same location. I saw this same behavior with a file on an APFS volume as long as the file had not been copied or a snapshot did not exist for this volume. As soon as I copied the file or created a snapshot of the volume, all writes were made to new locations on the volume, locations that were not part of the original file.


LEAVE A COMMENT


  • Thank you for this very clear and detailed explanations. This along with the article about “speed” of APFS vs. HFS are what Apple should be writing. So more Kudos to you. Mike Bombich of Carbon Copy Cloner has produced similar quality and clear information. Many, many thanks!!




  • May I offer a bit of history?

    This BOLD! NEW! APFS approach was invented about 40 years ago at the Computer Automation (now long deceased) Austin Development Center.

    The original goal of this approach was to minimize disk space usage for Large databases that were often modified (in the early 80s, a 300Mb CDC Storage Module Drive was HUGE – both in storage capacity and in physical size…)

    This “APFS” approach was originally called “Purple Arrows” because in the ADC staff meeting where the Filesystem guys explained it to the entire project staff (I was the compiler guy) they used black markers on our whiteboard to show the original file layout on disk, and then purple markers to show how changes & updates would be handled. [Look at the “Copying files with APFS” diagram above, and mentally translate the yellow arrows in it into Purple Arrows. This is EXACTLY what their whiteboard diagram looked like.]

    And while it would be personally satisfying to be able to use Purple Arrows again, the recent decision by Apple to support only 64-bit apps in ongoing O/S releases leaves too many of my 32-bit legacy apps in the dust. So I’m stayin’ with 10.9.5! Hallelujah! Amen!




  • APFS makes FAKE copies ????
    That’s just… STUPID !!!
    when I copy a file I am protecting it from some corruption of a sector making it unreadable. But under APFS I could make 100 copies and only have ONE physical location on the disk.
    Unbelievably stupid.
    Apple is brain dead.




    • I’ve never heard of someone making a duplicate of a file on the same HD/partition JUST to protect it from the possible corruption of a sector. That’s an unbelievably stupid and horribly inefficient practice of backing up your data. Any rational person who’s worried about data corruption would copy their data to another HD. And anyone who’s extremely paranoid about it, would make a copy onto a third HD and store it at a remote location.

      I have (and do) make a duplicate of a file when I want to “archive” its current state, so that I can continue to modify the original. But, as soon as I make a modification, the entire file (all the data) gets copied.

      And by the way, my external HDD (RAID) is APFS, however it is only used for backing up files, so the above issue with copy-on-write never affects it, as nothing is ever copied and nothing is ever modified on it.




  • Just wanted to say thanks. Fully talked me out of converting my 2011 Mac mini to APFS.




  • Perhaps an MP4 movie file is not the best example. In most use cases an mp4 video will not have transitions written to the file, rather usually an editing app will render a new movie to a file.

    A better example would be a Photoshop, Illustrator, and even InDesign files. PSD (Photoshop) files can be huge and also copied to save points in time or to be a template. This would be more real world in my opinion.

    The other thing that was interesting is that “COW” Copy-on-Write is a bit of a misnomer here, since what is really occurring is “ROW”, redirect on write. Writes to the extent of the original file are redirected to a new extent. Don’t take my word for it though, Google it… I did! :]




  • Tim,

    Sadly I’ve come to a slightly different conclusion why you don’t want to use it.

    The real issue is the limits of SATA I/O queuing. This effects both SATA connected HDD’s & SSD’s!




  • Wow! Very illuminating! Thanks, Tim!




  • Funny, I saw the first comment and it said “Incredibly clear.” I had to laugh because I couldn’t follow past the definition of an offset. So, no, I don’t know much about this but I did just have a run-in with APFS and this article raises a question for me: I just recently upgraded to High Sierra and discovered my SSD was automatically reformatted to APFS, so I’m confused as you seem to imply one has a choice not to. I was under the impression that that is the format HS operates within; that one had no choice. Could you elucidate?




  • Incredibly clear. Thank you.




  • If there’s an easy way to force a real copy, that’s the obvious “way out”. Hopefully it’s easier than copying to a second drive and back.




    • If you use the cp command in a Terminal window, it will force a real copy rather than a copy on write. I am investigating other mechanisms as well.




      • Have you had any luck working out ways to copy files that disabled or bypasses copy on write? The reason I ask is I have noticed really odd performance degradation when using Parallels virtual machines when copy on write gets used.

        Initially I thought this was related to the fact that when I do backups, I usually shut down my VM and take a clone of the entire thing over to an external hard disk from the internal SSD. Since this takes a while, most of the time I would just copy the file to another directory which is blazing fast on the SSD drive, then do the copy to the back disk while I went back to work in the VM.

        However with High Sierra I started seeing crazy disk performance problems in my VM’s after doing this, because the copy I was copying to the external HDD was a copy on write clone (so not really a copy) and then running parallels on it, it created a lot of overhead.

        Once I realized what was going on, I copied my VM in full to an external HDD, then deleted it, and copy it back again and voila! Performance problems vanished.

        However now I am seeing some issues again after having been using parallels for a long time on High Sierra but this is after I stopped making local copies. I am gonna try cloning to the external disk again and copy it back to see if that fixes it.

        But ideally we need to find a way to make a copy without copy on write, to ‘flush’ the performance issues.

        Or even better, we need a way for really large files to have copy on write completely disabled! Parallels does not need it.




  • Is this why my hard disk performance seems to have gone down since I “upgraded” to High Sierra?




    • It will only affect your HDD speed if you have converted your volume to APFS or have erased it and created a new APFS volume. If not, some other part of the upgrade is responsible for your enhanced performance.




  • Have been waiting years for a new MAC PRO.
    Would appreciate being advised as soon as one becomes available.
    thank you.
    glovideo




  • Have you reported the issue via Apple Bug Reporter?
    http://bugreport.apple.com