Mark Bates
Mon, 22 Feb 2021

Exploring "io/fs" to Improve Test Performance and Testability

The What and Why of io/fs

To understand why the io/fs package was introduced we need to understand the basic principals of embedding. There are many aspects to embedding that need to be addressed when developing a tool, we will, however, only be discussing one of those aspects in this article.

Every tool to embed static files (and there have been many) works essentially the same way. When the tool is run, the static files are converted into bytes in a .go file that is then compiled into the binary. Once compiled, the tool has the responsibility of swapping calls to the file system with calls to a virtualized file system.

Understanding that we need to replace the file system with a virtualized one, when running a binary with embedded assets we, are faced with a question: how do we detect calls within the code that are meant for the virtualized assets and which ones are meant for the actual file system?

Imagine a tool that walks a directory and returns the names of all the .go files it finds. This tool would be of little use if it couldn't talk to the file system. Now, imagine, a web application with embedded assets, such as images, templates, and style sheets. The web application should be using the virtualized file system and not the real file system.

To tell these types of calls apart, an API needs to be introduced for the developer to use that instructs the tool on what to virtualize and what to allow access to the file system. These APIs come in many flavors. Early embedding tools, such as Packr used custom APIs.

type Box
	func Folder(path string) *Box
	func New(name string, path string) *Box
	func NewBox(path string) *Box
	func (b *Box) AddBytes(path string, t []byte) error
	func (b *Box) AddString(path string, t string) error
	func (b *Box) Bytes(name string) []byte
	func (b *Box) Find(name string) ([]byte, error)
	func (b *Box) FindString(name string) (string, error)
	func (b *Box) Has(name string) bool
	func (b *Box) HasDir(name string) bool
	func (b *Box) List() []string
	func (b *Box) MustBytes(name string) ([]byte, error)
	func (b *Box) MustString(name string) (string, error)
	func (b *Box) Open(name string) (http.File, error)
	func (b *Box) Resolve(key string) (file.File, error)
	func (b *Box) SetResolver(file string, res resolver.Resolver)
	func (b *Box) String(name string) string
	func (b *Box) Walk(wf WalkFunc) error
	func (b *Box) WalkPrefix(prefix string, wf WalkFunc) error

The upside to custom APIs is complete control over the experience for the tool developer. This includes making it easier for the developer to manage the complex relationships that need to be maintained under the covers.

The downside to this approach is that users of the tool have to learn a new API. Their code also becomes heavily reliant on the custom API making it difficult for them to upgrade over time.

Another approach is to offer an API that mimics the standard library. An example of this is the Pkger tool.

type File interface {
	Close() error
	Name() string
	Open(name string) (http.File, error)
	Read(p []byte) (int, error)
	Readdir(count int) ([]os.FileInfo, error)
	Seek(offset int64, whence int) (int64, error)
	Stat() (os.FileInfo, error)
	Write(b []byte) (int, error)
}

This approach has the upshot of a known API for users making it easier for them to embrace the tool without having to learn a new API.

This is the approach the standard library took when creating the io/fs package. The upside of this approach is that it has a known API for users, which makes it easier for users to embrace the tool.

The downside of this approach (of which io/fs suffers) is this can often lead to large, complex, interfaces. This large interface footprint is, unfortunately, required to properly mimic calls to the file system, as we will see shortly.

Testing File System Based Code

The io/fs package has benefits far beyond supporting the new embedding feature. One of the biggest benefits is in testing. This new package allows us to write code that interacts with the file system, albeit read-only, that can be easily tested.

In addition to increased testability the io/fs package will help us to write more readable tests and offer large performance gains when testing file system code.

To explore the io/fs package, lets write a function that will walk a given root path in search of files ending in .go . As it traverses the file system it will skip directories that match different prefixes; .git , node_modules , testdata , etc... We don't need to search the .git or node_modules folders since we know they won't contain any .go files. If we find a .go file in a supported directory, we append the path to list of files and continue on.

func GoFiles(root string) ([]string, error) {
	var data []string

	err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
		if err != nil {
			return err
		}
		base := filepath.Base(path)
		for _, sp := range SkipPaths {
			// if the name of the folder has a prefix listed in SkipPaths
			// then we should skip the directory.
			// e.g. node_modules, testdata, _foo, .git
			if strings.HasPrefix(base, sp) {
				return filepath.SkipDir
			}
		}

		// skip non-go files
		if filepath.Ext(path) != ".go" {
			return nil
		}

		data = append(data, path)

		return nil
	})

	return data, err
}

The result of this function would be a slice similar to the following.

[
	"benchmarks_test.go",
	"cmd/fsdemo/main.go",
	"cmd/fsdemo/main_test.go",
	"fsdemo.go",
	"fsdemo_test.go",
	"mock_file.go",
]

The question I now pose is how do we test this code? Since this code interacts directly with the file system, how do we make sure we present an accurate file system scenario for the test in question?

Since there are many ways to test this code, let's first look at two common approaches for testing file system code before exploring how io/fs can help us more easily test this code.

JIT Test File Creation

The first approach to testing file system code is to create the necessary file/folder structures needed for that test at runtime.

In this article I will present the tests in the form of a benchmark. This will allow us to compare the performance of the various testing approaches. It's for these reasons that setup is included inside the benchmark on purpose. The setup of the test is what we are benchmarking. In this case the underlying function does not change between this and other test setup approaches.

func BenchmarkGoFilesJIT(b *testing.B) {
	for i := 0; i < b.N; i++ {

		dir, err := ioutil.TempDir("", "fsdemo")
		if err != nil {
			b.Fatal(err)
		}

		names := []string{"foo.go", "web/routes.go"}

		for _, s := range SkipPaths {
			// ex: ./.git/git.go
			// ex: ./node_modules/node_modules.go
			names = append(names, filepath.Join(s, s+".go"))
		}

		for _, f := range names {
			if err := os.MkdirAll(filepath.Join(dir, filepath.Dir(f)), 0755); err != nil {
				b.Fatal(err)
			}
			if err := ioutil.WriteFile(filepath.Join(dir, f), nil, 0666); err != nil {
				b.Fatal(err)
			}
		}

		list, err := GoFiles(dir)

		if err != nil {
			b.Fatal(err)
		}

		lexp := 2
		lact := len(list)
		if lact != lexp {
			b.Fatalf("expected list to have %d files, but got %d", lexp, lact)
		}

		sort.Strings(list)

		exp := []string{"foo.go", "web/routes.go"}
		for i, a := range list {
			e := exp[i]
			if !strings.HasSuffix(a, e) {
				b.Fatalf("expected %q to match expected %q", list, exp)
			}
		}

	}
}

In the BenchmarkGoFilesJIT test we use the io/ioutil package to create temporary directories and files that match the needed scenario for the given test. In this case that means creating directories for node_modules and .git that contain .go files in them so we can confirm that they are not included in the results. If the GoFiles function is correct we should only receive two results; foo.go and web/routes.go .

The JIT approach has two large draw backs. The first is the setup code can be quite cumbersome to write and maintain overtime. The amount of could needed to setup the tests also allows for more bugs within the test code itself. The other big draw back is speed. Tests that create files and directories JIT create a lot of I/O contention on the file system and I/O operations are among the slowest tasks we can perform.

goos: darwin
goarch: amd64
pkg: fsdemo
cpu: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
BenchmarkGoFilesJIT-16										1470			819064 ns/op

Pre-Existing File Fixtures

Another approach to testing the GoFiles function would be to create a testdata folder and fill it with folders containing all of your test scenarios.

testdata
└── scenario1
		├── _ignore
		│   └── ignore.go
		├── foo.go
		├── node_modules
		│   └── node_modules.go
		├── testdata
		│   └── testdata.go
		└── web
				└── routes.go

5 directories, 5 files

In this approach, since we already have the folder/file structures we need for each of our tests we can clean up a lot of test code and point the GoFiles function at the folder on disk that contains the appropriate scenario.

func BenchmarkGoFilesExistingFiles(b *testing.B) {
	for i := 0; i < b.N; i++ {

		list, err := GoFiles("./testdata/scenario1")

		if err != nil {
			b.Fatal(err)
		}

		lexp := 2
		lact := len(list)
		if lact != lexp {
			b.Fatalf("expected list to have %d files, but got %d", lexp, lact)
		}

		sort.Strings(list)

		exp := []string{"foo.go", "web/routes.go"}
		for i, a := range list {
			e := exp[i]
			if !strings.HasSuffix(a, e) {
				b.Fatalf("expected %q to match expected %q", list, exp)
			}
		}

	}
}

This approach greatly reduces the footprint of test, improving test reliability, and readability, as a result. This approach also yields much faster tests than the JIT approach.

goos: darwin
goarch: amd64
pkg: fsdemo
cpu: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
BenchmarkGoFilesExistingFiles-16					9795			120648 ns/op
BenchmarkGoFilesJIT-16										1470			819064 ns/op

The downside of this approach is the number and combination of files/folders needed to create a robust test for the GoFiles function. So far, we've only tested the "successful" path. We haven't written tests for error scenarios and other potential issues.

Often what happens, when using this approach, is developers start to overload these scenarios for multiple tests. Instead of creating new structures on disk, these scenarios are modified over time to meet the needs of a new test. This couples tests together and increases the likely hood of brittle tests that fail when someone makes a change for another test case.

By rewriting the GoFiles function to make use of the io/fs we can address all of these issues.

Using FS

As we have learned, the io/fs package allows for the implementation of a virtual file system. We can leverage this be rewriting the GoFiles function to accept an implementation of the fs.FS interface. In production code we can use the os.DirFS function to get an implementation of the fs.FS interface backed by the underlying file system.

In order to walk the fs.FS implementation we need to use the fs.WalkDir function, which behaves nearly identically to the filepath.Walk function. While these differences are worth exploring, they fall outside the scope of this article, so we'll address them in a future article.

func GoFilesFS(root string, sys fs.FS) ([]string, error) {
	var data []string

	err := fs.WalkDir(sys, ".", func(path string, de fs.DirEntry, err error) error {
		if err != nil {
			return err
		}

		base := filepath.Base(path)
		for _, sp := range SkipPaths {
			// if the name of the folder has a prefix listed in SkipPaths
			// then we should skip the directory.
			// e.g. node_modules, testdata, _foo, .git
			if strings.HasPrefix(base, sp) {
				return filepath.SkipDir
			}
		}

		// skip non-go files
		if filepath.Ext(path) != ".go" {
			return nil
		}

		data = append(data, path)

		return nil
	})

	return data, err
}

Since these changes are minimal to make to existing code, we can very quickly get all of the benefits of this new package, without an expensive rewrite.

Implementing FS

With the function now updated to use accept an fs.FS , let's look at how we can write tests for it. Before we can write tests, however, we will need an implementation of fs.FS .

type FS interface {
	// Open opens the named file.
	//
	// When Open returns an error, it should be of type *PathError
	// with the Op field set to "open", the Path field set to name,
	// and the Err field describing the problem.
	//
	// Open should reject attempts to open names that do not satisfy
	// ValidPath(name), returning a *PathError with Err set to
	// ErrInvalid or ErrNotExist.
	Open(name string) (File, error)
}

The Open method takes the path of a file, and returns a fs.File and an error. There are, as the documentation states, certain requirements regarding errors that must be met.

For our tests we're going to use a slice of our mock file type, which we'll implement shortly, as our fs.FS implementation. A slice will be able to implement all of the functionality we'll need for our tests.

type MockFS []*MockFile

func (mfs MockFS) Open(name string) (fs.File, error) {
	for _, f := range mfs {
		if f.Name() == name {
			return f, nil
		}
	}

	if len(mfs) > 0 {
		return mfs[0].FS.Open(name)
	}

	return nil, &fs.PathError{
		Op:   "read",
		Path: name,
		Err:  os.ErrNotExist,
	}
}

In the MockFS.Open we loop through the known files to match the requested name. If a file is found, it is returned. If the file is not found, we try to recurse into the first file if there is one. This solves for when the top level entry is just "." and all other files and directories are in a tree underneath. Finally, if the file is not found, we return the appropriate error as per the documentation.

Our MockFS implementation is not complete yet, however. We are also going to need to implement the fs.ReadDirFS interface to help us with our mock file implementation later. While the fs.ReadDirFS documentation does not mention the following constraints, they are required for fs.ReadDirFile and File.ReadDir . As such, they're worth taking note of and implementing.

// ReadDir reads the contents of the directory and returns
// a slice of up to n DirEntry values in directory order.
// Subsequent calls on the same file will yield further DirEntry values.
//
// If n > 0, ReadDir returns at most n DirEntry structures.
// In this case, if ReadDir returns an empty slice, it will return
// a non-nil error explaining why.
// At the end of a directory, the error is io.EOF.
//
// If n <= 0, ReadDir returns all the DirEntry values from the directory
// in a single slice. In this case, if ReadDir succeeds (reads all the way
// to the end of the directory), it returns the slice and a nil error.
// If it encounters an error before the end of the directory,
// ReadDir returns the DirEntry list read until that point and a non-nil error.

While those rules can sound confusing, in practice this logic is fairly straightforward.

func (mfs MockFS) ReadDir(n int) ([]fs.DirEntry, error) {
	list := make([]fs.DirEntry, 0, len(mfs))

	for _, v := range mfs {
		list = append(list, v)
	}

	sort.Slice(list, func(a, b int) bool {
		return list[a].Name() > list[b].Name()
	})

	if n < 0 {
		return list, nil
	}

	if n > len(list) {
		return list, io.EOF
	}
	return list[:n], io.EOF
}

Implementing File Interfaces

With our fs.FS implementation completed we now have to implement quite a few interfaces to satisfy the needs of the fs package. Thankfully we can collapse all of these interfaces into one type to make our testing that much easier.

Before continuing I would like to note that I purposefully did not fully implement the file reading portions of the interfaces. This code added unnecessary complexity to the example that wasn't needed for this article. We will discuss this topic in a future post.

To test our code we will need to implement four different interfaces; fs.File , fs.FileInfo , fs.ReadDirFile , and fs.DirEntry .

type File interface {
	Stat() (FileInfo, error)
	Read([]byte) (int, error)
	Close() error
}

type FileInfo interface {
	Name() string
	Size() int64
	Mode() FileMode
	ModTime() time.Time
	IsDir() bool
	Sys() interface{}
}

type ReadDirFile interface {
	File
	ReadDir(n int) ([]DirEntry, error)
}

type DirEntry interface {
	Name() string
	IsDir() bool
	Type() FileMode
	Info() (FileInfo, error)
}

The sheer size of these interfaces might seem overwhelming at first, but thankfully, we can condense them to one type as they contain a lot of overlapping functions.

type MockFile struct {
	FS      MockFS
	isDir   bool
	modTime time.Time
	mode    fs.FileMode
	name    string
	size    int64
	sys     interface{}
}

The MockFile type holds a MockFS which, for directories, will hold the files in that are in that directory. The rest of the fields in the type are there for us to set as return values for their corresponding functions.

func (m *MockFile) Name() string {
	return m.name
}

func (m *MockFile) IsDir() bool {
	return m.isDir
}

func (mf *MockFile) Info() (fs.FileInfo, error) {
	return mf.Stat()
}

func (mf *MockFile) Stat() (fs.FileInfo, error) {
	return mf, nil
}

func (m *MockFile) Size() int64 {
	return m.size
}

func (m *MockFile) Mode() os.FileMode {
	return m.mode
}

func (m *MockFile) ModTime() time.Time {
	return m.modTime
}

func (m *MockFile) Sys() interface{} {
	return m.sys
}

func (m *MockFile) Type() fs.FileMode {
	return m.Mode().Type()
}

func (mf *MockFile) Read(p []byte) (int, error) {
	panic("not implemented")
}

func (mf *MockFile) Close() error {
	return nil
}

func (m *MockFile) ReadDir(n int) ([]fs.DirEntry, error) {
	if !m.IsDir() {
		return nil, os.ErrNotExist
	}

	if m.FS == nil {
		return nil, nil
	}
	return m.FS.ReadDir(n)
}

Methods such as, Stat() (fs.FileInfo, error) can return the MockFile receiver as it implements that interface already. This is an example of how our one MockFile type can implement the many interfaces needed.

Testing With FS

With the MockFS and MockFile types we can now write tests for the GoFilesFS function. Like the previous testing patterns, we need to first setup the folder and file structure needed for the test. With two helper functions, NewFile and NewDir , and the simplicity of a slice backing our fs.FS implementation, we can quickly build complex file and folder structures all in memory.

func BenchmarkGoFilesFS(b *testing.B) {
	for i := 0; i < b.N; i++ {
		files := MockFS{
			// ./foo.go
			NewFile("foo.go"),
			// ./web/routes.go
			NewDir("web", NewFile("routes.go")),
		}

		for _, s := range SkipPaths {
			// ex: ./.git/git.go
			// ex: ./node_modules/node_modules.go
			files = append(files, NewDir(s, NewFile(s+".go")))
		}

		mfs := MockFS{
			// ./
			NewDir(".", files...),
		}

		list, err := GoFilesFS("/", mfs)

		if err != nil {
			b.Fatal(err)
		}

		lexp := 2
		lact := len(list)
		if lact != lexp {
			b.Fatalf("expected list to have %d files, but got %d", lexp, lact)
		}

		sort.Strings(list)

		exp := []string{"foo.go", "web/routes.go"}
		for i, a := range list {
			e := exp[i]
			if e != a {
				b.Fatalf("expected %q to match expected %q", list, exp)
			}
		}

	}
}

func NewFile(name string) *MockFile {
	return &MockFile{
		name: name,
	}
}

func NewDir(name string, files ...*MockFile) *MockFile {
	return &MockFile{
		FS:    files,
		isDir: true,
		name:  name,
	}
}

This setup code does all we need and it does it quite simply and efficiently. If we need to add a new file or folder to this test, it becomes a quick insertion of a line or two. Most importantly, we are not distracted with complex setup code when we are trying to write tests.

BenchmarkGoFilesFS-16										432418				2605 ns/op

Summary

With BenchmarkGoFilesJIT we had a lot of file setup and teardown code that worked directly with the file system. With that approach, we had potential for introducing errors and bugs from the test itself. The setup and teardown code also dominates the beginning of the test and its complexity makes it hard to introduce changes to the test scenario. This approach also performed the worst in benchmarks.

The BenchmarkGoFilesExistingFiles test used pre-existing folder and file structures scenarios in a testdata folder. This required no setup code in the test, we just point the test at the correct scenario on disk. This approach has other benefits as well, such as they're real files that can easily be edited and manipulated with standard tooling. Compared with the JIT approach, using existing scenarios on disk greatly increased performance. The cost of this approach comes in the numbers of scenarios that need to be created and committed in the repo. These scenarios also tend to get co-opted by other tests and end up creating brittle tests.

Both JIT and pre-existing scenarios, suffer from other drawbacks such mocking out files with large file sizes, mocking file permissions, errors, etc... The io/fs , other hand, helps us solve these problems.

We have seen how by changing our code slightly to use the io/fs package our test code became easier to write. With this approach, no teardown code is needed. Setting up the scenario was as simple as appending to a slice using a simple type, making it easy for us to modify our tests as needed. Our MockFile type lets us mock out file sizes, permissions, errors, and more, as does the MockFS type. On top of all of this we have seen that by using the io/fs and implementing its interfaces we were able to speed up our file system tests by over 300% compared with JIT testing.

goos: darwin
goarch: amd64
pkg: fsdemo
cpu: Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
BenchmarkGoFilesFS-16										432418				2605 ns/op
BenchmarkGoFilesExistingFiles-16					9795			120648 ns/op
BenchmarkGoFilesJIT-16										1470			819064 ns/op

While this article covers of how we can use the new io/fs package to our advantage for testing, that is only the tip of the iceberg for this package. Consider, for example, a file transformation pipeline that runs a transformer function on the file based on its file type; e.g. convert Markdown to HTML for .md files. Using the io/fs package, you can easily create this pipeline passing along interfaces as you go, and testing this pipeline, would also be relatively straightforward. There is a lot to be excited about in Go 1.16 , but, for me, the io/fs package is the one I'm most excited for.

More Articles

Where and When to use Iota in Go

Iota is a useful concept for creating incrementing constants in Go. However, there are several areas where iota may not be appropriate to use. This article will cover several different ways in which you can use iota, and tips on where to be cautious with it's use.

Learn more

Leveraging the Go Type System

If you haven't worked in a typed language before, it may not be obvious at first the power that it brings. This article will show you how to leverage the type system to make your code easier to use and more reusable.

Learn more

Embracing the Go Type System

Go is a typed language, but most Go developers don't truly embrace it. This short article talks about tips and tricks to write more robust code using custom types in Go.

Learn more

Subscribe to our newsletter