Engineering Speaks: How we used NFS and Samba to create the perfect cloud storage server
Engineering Speaks is where the Signeasy engineering brains stop by to narrate inside stories of innovation, engineering culture, and hacks that go behind ensuring the best user experience.
Much before AWS released EFS, we at Signeasy had our own version of it deployed in production. We are still using a slightly modified version of this till date, but to solve a different problem. Here's a complete lowdown.“What do you do when a critical file storage system bloats up and creates bottleneck for smooth scaling up of your business?”The engineering team at Signeasy faced this question while we were working on building a long-term storage solution. The roll-out of this solution needed time. But waiting for it to be ready for deployment was not an option considering the rate at which the file storage system was bloating up. The situation demanded a quick fix to break through this storage problem, keep the lights on and not let our users feel any glitch in their user experience.How we approached the problemTo give you a background on Signeasy’s stack, we use Ubuntu servers heavily along with a few Windows ones (ping us to know why) in AWS. Our Linux and Windows servers had to communicate with each other in files and the solution we were looking for was something that could be set up in a short time and was low maintenance. The firefighting mode forced us to a stop-gap fix and we went with Network File System (NFS), a straightforward solution for sharing files across different servers. NFS lets multiple slaves use a remote filesystem (master) as if it were present locally. Not the best solution out there but seemed to fit our bill by the looks of it. Read more about setting up NFS in Ubuntu here. And hacked our way through it We set up a textbook NFS architecture with one master and multiple slaves. We used a RAID 10 filesystem which let us have backups and deal with EBS disk failures as well. But NFS for Windows was still a tricky affair and that was where Samba came to our rescue. A Samba server helps a Windows machine use a remotely hosted Linux filesystem to be used locally. Setting up both NFS and Samba on the file storage machine made the filesystem be accessible on all our API servers. Read more about setting up Samba in Ubuntu here.After the setup, here is what our cloud storage system looked like
- A central storage server with EBS backed volumes on RAID-10 architecture along with the 2 nfs services (nfs-server & samba-server), held together our data.
- EC2 security groups were setup to grant access to the API servers and the servers communicated with each other using relative file paths.
- The Windows machines did not need any additional setup required to use the remote filesystem.
We held our breath and rolled it out Our first few days were jittery, as our customers are not used to spotty service and the team did not want to change that status quo. (And by the way, there was lingering skepticism as we are not believers in 100% availability.) The files were being transferred quickly as the servers were strategically placed in the right availability zones. Though there were small glitches whenever a corrupted file was being transferred, overall it was running smoothly like a well-oiled engine. We had no major outages or data loss/corruption due to the transfer of files between the machines. This system worked quite well. In short, we were pleasantly surprised by the efficiency and neatness of the system.And it saved us some dollars too! Amazon announced EFS only in May 2015 and we had been using our version of it for about 10 months by then. We did a small back-of-the-napkin calculation of our costs and it turned out to be 33% cheaper than what we would have been billed for EFS. But then EFS does come with its own advantages of ease of scaling down and availability. P.S: As of today, this system is being used in production but not as our primary data storage as we soon moved to our permanent storage system. We will soon share insights on how we improved on this system and moved to our permanent fix without a service downtime!