Challenge
A PACE (Property Assessed Clean Energy) financing company, Ygrene has funded more than $1 billion in loans since 2010. In order to approve and process those loans, "We have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data," says Ygrene Development Manager Austin Adams. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically. We had a really unstable system that became overwhelmed with requests just for doing background data processing in real time. The performance the users saw was very poor. We needed a solution that wouldn't require us to make huge refactors to the code base." As a finance company, Ygrene also needed to ensure that they were shipping their applications securely.
Solution
Moving from an Engine Yard platform and Amazon Elastic Beanstalk, the Ygrene team embraced cloud native technologies and practices: Kubernetes to help scale out vertically and distribute workloads, Notary to put in build-time controls and get trust on the Docker images being used with third-party dependencies, and Fluentd for "observing every part of our stack," all running on Amazon EC2 Spot.
Impact
Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for the overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed." Additionally, by using the kops project, Ygrene can now run its Kubernetes clusters with AWS EC2 Spot, at a tenth of the previous cost. These cloud native technologies have "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."
A PACE (Property Assessed Clean Energy) financing company, "We take the equity in a home or a commercial building, and use it to finance property improvements for anything that saves electricity, produces electricity, saves water, or reduces carbon emissions," says Development Manager Austin Adams.
In order to approve those loans, the company processes an enormous amount of underwriting data. "We have tons of different points that we have to validate about the property, about the company, or about the person," Adams says. "So we have lots of data sources that are being aggregated, and we also have lots of systems that need to churn on that data in real time."
By 2017, deployments and scalability had become pain points. The company was utilizing massive servers, and "we just reached the limit of being able to scale them vertically," he says. Migrating to AWS Elastic Beanstalk didn't solve the problem: "The Scala services needed a lot of data from the main Ruby on Rails services and from different vendors, so they were asking for information from our Ruby services at a rate that those services couldn't handle. We had lots of configuration misses with Elastic Beanstalk as well. It just came to a head, and we realized we had a really unstable system."
Adams along with the rest of the team set out to find a solution that would be transformational, but "wouldn't require us to make huge refactors to the code base," he says. And as a finance company, Ygrene needed security as much as scalability. They found the answer by embracing cloud native technologies: Kubernetes to help scale out vertically and distribute workloads, Notary to achieve reliable security at every level, and Fluentd for observability. "Kubernetes was where the community was going, and we wanted to be future proof," says Adams.
With Kubernetes, the team was able to quickly containerize the Ygrene application with Docker. "We had to change some practices and code, and the way things were built," Adams says, "but we were able to get our main systems onto Kubernetes in a month or so, and then into production within two months. That's very fast for a finance company."
How? Cloud native has "changed the game for scalability, observability, and security—we're adding new data sources that are very secure," says Adams. "Without Kubernetes, Notary, and Fluentd, we couldn't tell our investors and team members that we knew what was going on."
Notary, in particular, "has been a godsend," says Adams. "We need to know that our attack surface on third-party dependencies is low, or at least managed. We use it as a trust system and we also use it as a separation, so production images are signed by Notary, but some development images we don't sign. That is to ensure that they can't get into the production cluster. We've been using it in the test cluster to feel more secure about our builds."
By using the kops project, Ygrene was able to move from Elastic Beanstalk to running its Kubernetes clusters on AWS EC2 Spot, at a tenth of the previous cost. "In order to scale before, we would need to up our instance sizes, incurring high cost for low value," says Adams. "Now with Kubernetes and kops, we are able to scale horizontally on Spot with multiple instance groups."
That also helped them mitigate the risk that comes with running in the public cloud. "We figured out, essentially, that if we're able to select instance classes using EC2 Spot that had an extremely low likelihood of interruption and zero history of interruption, and we're willing to pay a price high enough, that we could virtually get the same guarantee using Kubernetes because we have enough nodes," says Software Engineer Zach Arnold, who led the migration to Kubernetes. "Now that we've re-architected these pieces of the application to not live on the same server, we can push out to many different servers and have a more stable deployment."
As a result, the team can now ship code any time of day. "That was risky because it could bring down your whole loan management software with it," says Arnold. "But we now can deploy safely and securely during the day."
Before, deployments typically took three to four hours, and two or three months' worth of work would be deployed at low-traffic times every week or two weeks. Now, they take five minutes for Kubernetes, and an hour for an overall deploy with smoke testing. And "we're able to deploy three or four times a week, with just one week's or two days' worth of work," Adams says. "We're deploying during the work week, in the daytime and without any downtime. We had to ask for business approval to take the systems down for 30 minutes to an hour, even in the middle of the night, because people could be doing loans. Now we can deploy, ship code, and migrate databases, all without taking the system down. The company gets new features without worrying that some business will be lost or delayed."
Cloud native also affected how Ygrene's 50+ developers and contractors work. Adams and Arnold spent considerable time "teaching people to think distributed out of the box," says Arnold. "We ended up picking what we call the Four S's of Shipping: safely, securely, stably, and speedily." (For more on the security piece of it, see their article on their "continuous hacking" strategy.) As for the engineers, says Adams, "they have been able to advance as their software has advanced. I think that at the end of the day, the developers feel better about what they're doing, and they also feel more connected to the modern software development community."
Looking ahead, Adams is excited to explore more CNCF projects, including SPIFFE and SPIRE. "CNCF has been an amazing incubator for so many projects," he says. "Now we look at its webpage regularly to find out if there are any new, awesome, high-quality projects we can implement into our stack. It's actually become a hub for us for knowing what software we need to be looking at to make our systems more secure or more scalable."