Servers of Happiness
At Least Authority we understand the sensitivity of your data and are committed to ensuring confidentiality, integrity, and reliability as critical components of our products/ service to you.
When clients are encoding files, they control several parameters. One of these is called “happiness” (or simply “H”) which tells the client the minimum number of servers that shares must be placed on. We have recently improved the algorithm used to determine share placement.
Before we discuss servers-of-happiness, let’s briefly explain what Tahoe-LAFS is.
Tahoe-LAFS is a free, open-source cloud storage system with verifiable end-to-end security. It distributes user data across multiple servers but unlike other services it is built with provider-independent security. This means providers of storage services do NOT have to be trusted for integrity or privacy of files.
How Tahoe-LAFS works
With Tahoe-LAFS, you run a client program on your computer which talks to one or more storage servers on other computers.
When you tell your client to store a file, it: encrypts that file; encodes it into multiple pieces; then spreads those pieces out among multiple servers. The pieces are all encrypted and protected against modifications. That means a malicious or misconfigured storage server cannot alter or view your data.
When you ask your client to retrieve the file, it will: find the necessary pieces; make sure they have not been corrupted; reassemble them; and decrypt the result using a key known only to the client.
- The client creates more pieces (or “shares”) than it will eventually need so even if some of the servers fail or corrupt the data you can still get your data back.
- Corrupt shares are detected and ignored, so the system can tolerate server-side hard-drive errors (or even malicious storage servers).
- All files are encrypted (with a unique key) before uploading so even a malicious server operator cannot read your data.
This leaves you totally independent of the servers or service providers. You don’t have to rely on them for confidentiality, integrity or absolute availability. This is why we call it the “provider-independent security”.
Encoding of shares in Tahoe-LAFS.
Tahoe-LAFS uses erasure coding (using the ZFEC algorithm) while encoding data according to two variables: the total number of shares (K) and the number required to reconstruct the original ciphertext (N). This creates some redundant data (the amount depending on the parameters) such that a client only needs to find N shares (out of K total) to reconstruct the ciphertext.
For example, a client may set K=5 and N=2 meaning it will encode the data such that any 2 shares are sufficient to recover it and 5 shares will be produced in total.
Once these shares are encoded locally, the client must find storage servers that will accept them and upload the shares. These parameters are controlled by the client, but the client cannot control which storage servers are available at a given time.
A third metric for “servers-of-happiness”.
In addition to K and N, Tahoe-LAFS uses a third parameter H which we call “happiness”.
This metric is a test that looks at a file on a Tahoe-LAFS grid and measures how well distributed it is across the available storage servers. The test computes how many different servers you might need to contact to find N shares. The actual value H is a minimum number of servers over which the shares must be distributed. For example, if H=3 but at the time of upload you can only reach 2 servers, the upload will fail (it is “not healthy”).
Before the “servers of happiness” changes, we used an iterative placement algorithm that would place some shares, examine what happened, and then keep trying. This works fairly well, but doesn’t always achieve happiness even if that is theoretically possible. Our new algorithm results from the thesis work of Mark Berger as well as coding and review by many other Tahoe-LAFS contributors.
Currently this new upload strategy is used only for immutable file uploads.
How the algorithm works.
Our new upload placement algorithm will find an optimal solution (if one exists). Optimal here means an upload solution with maximal happiness. Note that there can be multiple “optimal” answers to the same placement problem. Using the same example numbers, it is possible to place 5 shares on 3 servers such that the happiness value “3” is not actually achieved: you could place 4 shares on server A and 1 share on server B. Furthermore, servers can suffer from transient failures (such as network disconnects).
It is also worth noting that there are many different placements of the 5 shares that do satisfy an H value of 3. If the shares are numbered S1..S5, then one could place S1 on server A, S2 on server B and S3 on server C. Then it doesn’t matter where S4 & S5 are placed (even both on the same server) as the happiness metric will be satisfied. Obviously, there must be at least 3 servers in the grid to be able to achieve H=3. If there are more than 3 servers, happiness can be maximized by placing at least one share on each server.
The 10-step algorithm now used for share placement is described in detail in the specifications documents. Ignoring many details: we find a maximum matching graph over a bipartite graph mapping shares to available servers. Finding and selecting one such matching is done using the efficient “Ford-Fulkerson” algorithm.
There are further wrinkles to this:
- Some shares may already exist on some servers -- either from previous attempts, for example, or because the client is re-encoding the same file (we use deterministic encryption)
- Some servers will not have enough space to accept the shares
- Errors may occur while uploading the shares after we’ve selected servers
More details are available.
You may read further details about measuring servers-of-happiness in the Tahoe-LAFS documentation.
We are constantly trying to improve our product by adding new features to it and we would love to hear your feedback! If you have any suggestions, comments, or questions about servers-of-happiness please send them our way at firstname.lastname@example.org.
At Least Authority we believe in security in design; we do not believe in security by policy. In keeping with our credence that the authenticity and confidentiality of your data should be independent of your storage provider, we try to create and design software that is inspired by the end-to-end principle and the principle of least authority.
If you are interested in using, contributing, or supporting our products, contact the Least Authority team or sign up to be notified about any updates.