Google Summer Of Code '16
ALL the commits in this repository, as well as all the pushes to the following docker repositories were done as a part of Google Summer of Code 2016
- https://hub.docker.com/r/pallavagarwal07/gentoo-stabilization/
- https://hub.docker.com/r/pallavagarwal07/testimages/
Goal
Gentoo is an operating system with extreme focus on configurability and performance. To provide fully customizable experience, without interfering with the stability of the system, Gentoo has a concept of masked packages. These masked packages (or versions) are either untested, or are known to be unstable and are installed only if the user explicitly unmasks them. While this concept is a boon to the stability of the operating system, the current implementation requires the packages to be tested manually by a team of developers. This significantly increases the time in which new packages are made safely available to the users. The goal of this project is to provide a mechanism to test and stabilize the packages automatically with little or no human intervention.
Result
During this project, I built Orca, a continuous build and stabilization engine that automatically tries to find out the packages with build/compile/test errors in the gentoo portage tree. It is also possible for any user to install the client to help in the stabilization process. All the results of the build are stored at a single place on the server.
Usage:
You can help in the stabilization of Gentoo packages by downloading the file
orca-client
from the repository, and running it. Please note that
you need docker installed to use the stabilization container.
If you are on Gentoo Operating system, you can use the ebuild in the folder
utilities/ebuilds/
. This ebuild will be added to the Gentoo Portage tree as
soon as the backward incompatible changes to the server are done, since the
ebuild may not be fit to use just now.
Repository Structure
├── Containers
│ ├── Dockerfile-(Client,Server,Solver)
│ ├── etc_portage
│ │ └── .................. :: Contents for /etc/portage in container
│ ├── Orca-Deployment.yml
│ ├── scripts
│ │ ├── ControlContainer
│ │ │ └── .............. :: Client container files
│ │ ├── FlagGenerator
│ │ │ └── .............. :: Flag, Dep Solver container files
│ │ └── Server
│ │ └── .............. :: Server (graph container) files
│ └── orca-client .......... :: WRAPPER script for client (symlinked)
├── Documents
│ └── ...................... :: Proposal, arch diagrams, minutes of meeting
├── utilities
│ ├── addAll.sh
│ ├── bugzillaRESTapi
│ │ ├── bug_file.py
│ │ └── bugzilla.py
│ ├── ebuilds .............. :: Ebuilds of client for Gentoo users
│ │ └── orca_ci-0.1.ebuild
│ ├── kubeUtils.sh
│ └── packages
├── LICENSE
└── README.md
Methodology
Often it is valuable to know if a package is failing to build, even if the exact reason for the failure is not known. The whole of Orca is built on this mindset, with the main task of indicating an error if it exists instead of trying to figure out why it exists.
Orca consists of two parts. The server and the client. Both play an important role in the stabilization process.
Server:
The server hosts a database of all the packages that have been stabilized/need to be stabilized. It recieves requests from the client, and orchestrates the stabilization request jobs. But it isn't enough to just have all the packages in a dump. A package cannot be stabilized until all its dependencies have been stabilized.
To get around this problem, the server maintains a tree type structure of all packages. Obviously, the dependencies occur as children in the tree. Doing this isn't as easy at it seems, because packages often have circular dependencies. The server to function properly, needs a Directed Acyclic Graph. Consider the following:
A
/ \
↙ ↘
B C
/ \
↙ ↘
D C
/
↙
A
There are two repetitions in the above graph, A
and C
. However C
doesn't
actually give us any trouble. Because the tree still remains acyclic (since it
is directed). However, the chain A->B->D->A
forms a directed cycle. To
resolve such cycles, the server replace one of the nodes with a new "fake"
node. This makes the above tree.
A
/ \
↙ ↘
B C
/ \
↙ ↘
D C
/
↙
A*
This again makes the graph acyclic. Starred nodes are assumed to be stable
(i.e. the fake nodes, not the top level A
node) and are never sent for
stabilization.
In the resulting graph, the server looks for a leaf node, and sends it for stabilization.
USE flags and combinations
Find the main article HERE
Gentoo Packages are not all built with the same settings, and users have the ability to customize their build with a variety of USE flags (like on-off switches for various features). This gives the users unparalleled control, and is one of the most inviting features of the Gentoo Operating System.
However, the USE flags also make it very difficult to test the packages of
bugs, because for n
USE flags, there are literally 2^n
different ways to
build the package. The server takes a shortcut, and builds each package with at
most 4 different combinations of USE flags.
- No USE flags are set
- All USE flags are set
- Random combination not present in above two
- Random combination not present in above three
Some of the USE flag combinations given by the above rules may not be legal.
For example, if an ebuild specifies REQUIRED_USE="^^ (a b c)"
then EXACTLY
one flag out of a
, b
, and c
should be enabled. This doesn't fall under
"without USE flags" or "with all USE flags" category.
So, the server instead of choosing the flags randomly, also needs to calculate
the flag combinations that are legal for that package. Server does this by
modelling the REQUIRED_USE
constraint as a boolean satisfiability problem.
After solving the satisfiability problem with a SAT solver, it finds
combinations that are allowed by the ebuild to prevent errors due to this
cause.
Server Architecture
The server uses Kubernetes for orchestration. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. You can read more about it here
The server runs three docker containers: "Server", "Flag Generator" and "Dependency Solver". There is also a mongoDB container which stores all of the information for the dependency graph.
A Kubernetes Service surrounds each of "Flag Generator" and "Dependency Solver" which means that multiple containers of each can hide behind those services and a load balancer would distribute the incoming requests to the containers.
Stabilization procedure
When a client runs the wrapper script for stabilization, the client spawns a docker container with a minimal gentoo system. The system requests a package name from the server. When faced with this request, server evaluates the DAG of the packages and returns a leaf node.
Note that every package node has multiple USE flag combinations set in. The server selects one, and sends the data to the client. The client sets portage settings to those USE flags and runs the merge of that package. The client merge stops in case of any errors, or continues till the build is over in case there aren't any. After the build, all the logs, as well as the build output is tarred, and uploaded to an Openstack swift storage.
For the upload request, server generates a time limited upload token for the openstack storage, and gives it to the client. Once the logs are uploaded, the server marks the package in the tree STABLE/UNSTABLE, depending on what was reported.
Automated Builds with Travis
Travis CI is a service that allows code to be tested on every push by running it on their server with a predefined set of parameters. Orca exploits this, by using the service to build Gentoo packages as a part of a client's continuous integration.
The repository has a separate branch trigger
. Whenever a
Travis build needs to be triggered, the server makes a dummy commit on this
branch. Travis detects this, and runs orca-client as a part of its
continuous integration. Thus, for every trigger one package-USE flag
combination can be tested.
There are some limits to Travis's capabilities, including time limit (around 50 min), size of output log, number of concurrent threads (max 2), however, it works well for many of our purposes. Currently, the server is configured to trigger Travis builds every 15 minutes.
Deploy your own stabilization server
Since the server is built completely on Kubernetes, it is very easy to deploy on a separate machine. However, there are a couple of problems one must be aware of.
- Trigger auth: To trigger builds in Travis CI, the server authenticates with
github to make a dummy commit in
trigger
branch of the repository. The key for this authentication is stored in a folder on the VM with the filename "auth". This folder is mounted to/secret
directory on the server container. - Openstack storage: The server generates Temp-URLs for the clients so that
the clients may use the swift storage to store the build logs. To generate
the temp-url, we need the swift openstack secret. This secret is also stored
in the same directory as mentioned above with the filename
secret
.
Once the above two are taken care of, follow the instructions on the Kubernetes
website to set up a kubernetes cluster, depending on the platform you have.
After that, simply run kubectl create -f $REPO/Containers/Orca-Deployment.yml
where $REPO
is the path to the repository and it should start the server.
The server, by default exposes the port 32000, on which all the requests are made
by the client. This and all other configuration can be changed from inside kubernets
config files. There are a couple of scripts in utilities
directory
in kubeUtils.sh, which can be modified according to the user's needs to make usage
of kubectl
command line a lot more convinient.
If the logs show the flag solver crashing on request sending, it is a bug of kubernetes or that of our config files. Stopping and restarting the services seems to work when this happens. This indicates that either the order of services spawn somehow affects the bridge network created between the container services, or it is indeed a bug in k8s. Note that since services(not pods) are created instantly, it shouldn't depend on order.
Scope and future work
Work on this project is far from over. The server currently can help stabilize packages based on USE flags only. There are many more variables involved when working with Gentoo. Things like architecture, python_targets, ruby_targets can still cause unforeseen problems, and the task is to modify the server to make the testing generic, irrespective of what the factors of testing be.
Thanks
I have no words that can describe how grateful I am to my mentors - Sebastien Fabbro and Nitin Agarwal - for all their support, for being extremely responsive and helpful with every one of my problems. Without their vote of confidence, this project would've been a lot harder and a lot less fun to do.
I would be lucky to get to work with them further as I continue work on this project and try to get it accepted as an official Gentoo Project.
I am also thankful to Gentoo Organisation for the opprtunity to work on this project, which helped me learn a lot in such a short period.