Testing United Conference: Unleashing the Power of AI in QA
QA Engineer Iana and her colleague Sara report their insights from the latest Testing United Conference with the main topic “AI Augmented QA".
12.04.2021
written by Pascal Luckhaus
Don’t get me wrong, we love Gitlab, and for the most part, we are completely happy with it. We value their commitment to transparency, especially when it comes to production incidents.
Reliability: In the first quarter of 2021, Gitlab suffered multiple incidents resulting in degraded service levels for shared runners [see: 1, 2]. This not only brought our development process to a halt but also endangered our production deployments.
Availability: Even while no active incident was reported by Gitlab, we observed on multiple occasions that jobs were stuck in a pending state (i.e. waiting for a runner) for up to 10 minutes. Pipeline speed is an important CI/CD KPI for us and we try to optimise for it, e.g. through means of parallel job execution. Not only are fast pipelines important for developers to quickly get feedback for changes, but also there is a business impact that we need to consider, as some of our legacy applications can not be deployed without downtime.
Performance: We have two applications that stand out from the other services in terms of high complexity and slow build speed. One is a Java back-end and the other one is an Angular front-end. While those builds were not normally fast, we observed that the build was a lot faster on our local development machine than in the pipeline.
Working towards the solution, we had a couple of constraints. Our Gitlab plan includes a certain amount of CI/CD minutes on shared runners that we wanted to use as much as possible. You don’t like throwing away money, do you? Also, we were getting close to maxing out this monthly quota, in which case we would have needed to buy additional shared runner minutes. Having read the section above, you might understand why we were reluctant to do that. Any new solution had to be cost-efficient, so setting up an old-fashioned build server on AWS that idles 80% of the time was out of the question for us.
Looking at the Gitlab docs for runners there are a couple of options for setting up runners, leading us to the following decisions:
Through experimentation, we found out that the sweet spot for building our complex applications were c5.xlarge instances with 4 cores and 8 GB of memory. Most jobs, such as deployment jobs using Terraform or Ansible, or even smaller Typescript builds did not benefit significantly from anything above t3a.small instances. For the instance running the Docker machine executor, which we will simply refer to as Runner Manager in the following, we used tiny t2.micro instances.
Have you had similar experiences with Gitlab Runners? Any questions? Connect and write to me on LinkedIn!