Having big Jenkins cluster requires monitoring many things. Lately we started saving information about what build ran on which machine and when. Jenkins actually provides that feature, this is called “Build history” and can be seen for the whole cluster or for some particular node. Unfortunately, when cluster is quite big (ours has more than 300 executors serving more than 10000 builds per day) Jenkins is not able to show the graph.
So we decided to build it ourselves. We are already using Telegraf for monitoring Jenkins, InfluxDB for storing time series data and Grafana for displaying the graphs. Jenkins provides the information we require through API, so all we had to do is to make a new kind of request to Jenkins Master. The API url is:
Instead of json you can use xml too, but I am parsing the result using ruby and it’s more
comfortable to parse json in ruby. We have to make a request not only for
executors - those are
responsible for ordinary jobs, but also for
oneOffExecutors - those show the pipeline jobs.
Number in the
executors shows the executor number and
job name, build number (and configuration for matrix jobs). Unfortunately
number in the
oneOffExecutors is always -1, but we actually do not care about the executor number, we just need
to know what build runs when and on what machine.
The parsed data gets written into the database so that node, job, build and config are used as tags and the only metric data is the executor number:
Then in grafana we have to make a query for node and group by all the other tags:
Which gives us a beautiful picture:
where every straight line is 1 build of 1 job.