Benchmarking

This guide is about measuring flow execution speed on Sirveo, and how to assess performance for specific configurations. The aim of this guide is primarily to help you answer two questions:

How much work can Sirveo do for a specific workload and deployment scenario?

How much compute do I need to support a particular throughput on a specific workload?

In the following sections, we’ll cover:

Setting up Sirveo on a Linux cloud VM
Setting up flows with and without external IO latency
Generating concurrent, constant load via webhooks
Assessing benchmark results

Test VM setup

For these benchmarks, I’ll run Sirveo on the following droplet:

Droplet type: Basic
CPU Option: Premium Intel
2 GB memory / 2 CPUs
60 GB SSD
OS: Ubuntu 23.10

I’m opting for 2 CPUs, since we’ll be installing PostgreSQL alongside Sirveo on the same VM. This minimizes latency between the server and the database.

While the VM provisions, I’ll get an evaluation license. Then some initial setup.

sudo apt-get update
sudo apt-get upgrade
sudo reboot
# ... reconnect
lsb_release -a
#     Description:    Ubuntu 23.10

Then, a standard PostgreSQL installation, as per official docs.

# ... 3 commands omitted
sudo su postgres
psql

#     psql (16.2 (Ubuntu 16.2-1.pgdg23.10+1))
#     postgres=#

Prepare a role and database in PostgreSQL.

CREATE ROLE sirveo_app
  WITH PASSWORD 'e90e26352ca40543448f592646bb3bb7e260fd5666644c90'
  LOGIN;

CREATE DATABASE sirveo_db
  WITH OWNER sirveo_app;

Download & verify the latest stable Sirveo binary, and make it available in the system PATH.

sudo mkdir /opt/sirveo
cd /opt/sirveo
sudo wget https://dl.sirveo.io/stable/sirveo-amd64-linux .
sudo wget https://dl.sirveo.io/stable/sirveo-amd64-linux.sha256sum .

# always verify hashes
sha256sum -c sirveo-amd64-linux.sha256sum
#     sirveo-amd64-linux: OK

# make binary executable
sudo chmod +x sirveo-amd64-linux

# symlink into PATH
sudo ln -s /opt/sirveo/sirveo-amd64-linux /usr/bin/sirveo

# and verify
stat `which sirveo`
#     File: /usr/bin/sirveo -> /opt/sirveo/sirveo-amd64-linux

# confirm Sirveo version
sirveo --version
#     version: 0.2.3
#     commit: d8a3476
#     build time: 2024-03-30T06:52:52+02:00
#     build mode: licensed
#     database revision: 14

Create a system user to run the server as:

sudo useradd \
  --system --no-create-home \
  --user-group \
  --shell /usr/sbin/nologin \
  sirveo

Bootstrap a default configuration file:

sudo mkdir /etc/sirveo
sudo sirveo default-config --yml > /etc/sirveo/config.yml

# and make some adjustments
sudo vi /etc/sirveo/config.yml

Configure essential settings:

---
# eula & license
 eula_accept: "yes"
 license: "...my license"

# database settings
 db_name: sirveo_db
 db_user: sirveo_app
 db_password: e90e26352...

# default user creation
 create_default_user: true
 default_username: benchmark01
 default_password: eb8ebda8...

# and a JWT signing secret
 jwt_signing_secret: "874f8be78..."

The server will run via a systemd, using a unit file like this:

[Unit]
Description=sirveo
After=network.target

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
User=sirveo
Group=sirveo
ExecStart=/usr/bin/sirveo --config-file=/etc/sirveo/config.yml

Restart=on-failure
RestartSec=10s
KillMode=process

Create the unit file, reload systemd config, tail the journal (new terminal), and start the server.

sudo vi /etc/systemd/system/sirveo.service
# ...
sudo systemctl daemon-reload

# (in another terminal)
journalctl -f -u sirveo

# start the server
sudo systemctl start sirveo

Sirveo is now up and running.

INF Sirveo v0.2.3
INF db connected host=localhost port=5432 sslmode=prefer
INF database is empty, running initial migrations
INF database migration complete
INF created default system user
INF [license] OLV activation OK status=200
INF [license] local verification OK expires=2024-04-25T15:47:52Z ...
INF [license] online verification OK expires=2024-04-06T05:27:36Z ...
INF starting inbound server address=127.0.0.1:7005 cors=false url=http://127.0.0.1:7005
INF starting admin server address=127.0.0.1:7001 cors=false url=http://127.0.0.1:7001
INF ready to flow 🟢

I’ll be using bombardier to generate some HTTP load. Since this is a short-lived test environment, I’ll grab the binary directly, instead of building from source.

sudo wget https://github.com/codesenberg/bombardier/releases/download/v1.2.6/bombardier-linux-amd64 \
  -O /usr/bin/bombardier

sudo chmod +x /usr/bin/bombardier

# (non-privileged user)
bombardier --version
#     bombardier version v1.2.6 linux/amd64

Using webhooks, it is easy to generate external load via HTTP requests.

Initial load test

For initial context, I want to establish an upper boundary for how fast the server can execute a flow. Performance is highly dependent on what the target flow is doing. So the best-case upper boundary needs a minimal flow without external IO.

Log in at http://localhost:7001
Create a new graph flow, with only the default passthrough node
Activate the flow
Create a new webhook for the graph flow:

identifier=bench01 output=Last

Sirveo is currently using about 24 MiB of reserved memory, and the VM is otherwise idling.

Now we’re ready to generate some HTTP traffic. I’m using the server’s inbound port, although the admin port will deliver similar results.

# 10,000 flow executions, with 1 TCP connection
bombardier -c 1 -n 10000 -l \
    http://127.0.0.1:7005/w/bench01

Results:

Bombarding http://127.0.0.1:7005/w/bench01 with 10000 request(s) using 1 connection(s)
...
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      2440.72     286.11    3210.22
  Latency      405.89us   156.52us     5.91ms
  Latency Distribution
     50%   367.00us
     75%   428.00us
     90%   516.00us
     95%   602.00us
     99%     0.97ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     0.91MB/s

The server manages up to 3210 requests per second, with an average of 2440. Notably, it maintains sub-millisecond response times for 99% of the 10000 requests. Reserved memory increased to about 32 MiB during the test.

What exactly is being measured here?

How fast the server can execute a minimal graph flow, via inbound HTTP requests, one at a time.

With concurrency

Let’s introduce some concurrency by using 2 simultaneous connections.

# 10,000 flow executions, with 2 TCP connection
bombardier -c 2 -n 10000 -l \
    http://127.0.0.1:7005/w/bench01

Bombarding http://127.0.0.1:7005/w/bench01 with 10000 request(s) using 2 connection(s)
...
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec      3472.77     711.66    4653.53
  Latency      571.89us   571.39us    15.47ms
  Latency Distribution
     50%   462.00us
     75%   597.00us
     90%   791.00us
     95%     0.97ms
     99%     2.65ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:     1.30MB/s

On average, the server now reaches 3472 flow executions, and response times still looks good. Increasing concurrency will not improve throughput beyond this point, noting that bombardier itself is using about 20% of CPU capacity for coordinating HTTP requests.

This test only serves as an indication of an upper boundary of execution speed, with a flow that is practically doing no internal work. Further note that memory usage is low because there’s very little data moving around within the flow.

With external latency

Next, we need to understand what throughput and resource utilization looks like when external latency is introduced.

I’ll create a new graph flow.

The HTTP node makes a request to an external service which is not going to rate-limit my requests. In this case I’m making HTTP requests directly to another Sirveo server on a different VM. Latency to the external service is roughly 4ms, for about 8ms round-trip time on HTTP responses.

The JS Code node converts a response status code into a string value, which also adds some more CPU-bound work and more internal data objects:

$out.status_code = $out.status_code.toString()

The status node passes the HTTP response from the upstream server back to the webhook output. This is to confirm that the upstream is providing consistent 200 responses.

The HTTP request can account for about 94% of this flow’s execution time.

And lastly, I’ll create a new webhook (bench02) that runs the new graph flow.

The aim of the measurement is now to find the highest average throughput, under constant load. Since the flow is spending some time waiting on responses, and I know that the upstream will deal with concurrency well, I can find the answer by gradually increasing concurrency on the benchmark.

Since this flow will be operating on more data internally, memory usage will become more aggressive as concurrency increases. It’s a good idea to constrain the server’s memory usage, which avoids the OS killing the server process. Configure a target memory limit in the unit file:

Environment=GOMEMLIMIT=1024MiB

ExecStart=/usr/bin/sirveo --config-file=/etc/sirveo/config.yml

1 GiB of memory is more than enough for this workload. After a systemd daemon-reload and a server restart, let’s test 10,000 requests with 2 connections:

# 10,000 flow executions, with 2 TCP connections
bombardier -c 2 -n 10000 -l \
    http://127.0.0.1:7005/w/bench02

About 1m30s into the test, htop reports a 1m load average of 1.06, or around 53% given 2 CPUs.

When the test completes:

Bombarding http://127.0.0.1:7005/w/bench02 with 10000 request(s) using 2 connection(s)
...
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       269.27      54.81     446.71
  Latency        7.42ms     1.76ms    46.64ms
  Latency Distribution
     50%     7.04ms
     75%     8.02ms
     90%     9.30ms
     95%    10.51ms
     99%    14.95ms
  HTTP codes:
    1xx - 0, 2xx - 10000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    83.06KB/s

On average, two connections achieves 269 requests per second, with peak throughput reported as 446 requests per second. Variability between the average and peak throughput is expected, since compute and network resources are available on a best-effort basis.

The average latency of around 8ms is in the expected ballpark, and the latency distribution looks reasonable. All 10,000 requests succeeded with a 200 status, which is coming from the upstream HTTP endpoint.

Observe what happens with 4 connections, which doubles the concurrent flow executions, and over a 2-minute period.

bombardier -c 4 -d 2m -l http://127.0.0.1:7005/w/bench02

At 1m30s into the test, the 1m load average reaches 1.84, or approximately 92% of 2 CPUs.

Bombarding http://127.0.0.1:7005/w/bench02 for 2m0s using 4 connection(s)
...
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       203.57      99.48     647.36
  Latency       19.65ms    22.03ms      1.12s
  Latency Distribution
     50%    13.21ms
     75%    22.36ms
     90%    42.94ms
     95%    56.78ms
     99%    94.03ms
  HTTP codes:
    1xx - 0, 2xx - 24412, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    62.76KB/s

Peak throughput increases by 45% to 647 requests per second, while, average throughput decreases to about 203 requests per second. While all 24,412 requests completed successfully, response times are beginning to deteriorate.

We’ll double concurrency again to see how throughput holds up.

# 25,000 flow executions, with 8 TCP connection
bombardier -c 8 -n 25000 -l http://127.0.0.1:7005/w/bench02

Load average sits at 1.77, or 88%, which is similar to the previous test.

Bombarding http://127.0.0.1:7005/w/bench02 with 25000 request(s) using 8 connection(s)
 25000 / 25000 [==================================================] 100.00% 195/s 2m8s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       195.52     122.34     955.83
  Latency       40.92ms    59.12ms      1.36s
  Latency Distribution
     50%    15.18ms
     75%    37.91ms
     90%   133.59ms
     95%   170.56ms
     99%   247.65ms
  HTTP codes:
    1xx - 0, 2xx - 25000, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    60.30KB/s

Average throughput is about the same at 195 requests per second. The increased concurrency improves peak throughput by a further 47%, reaching 955 requests per second, but average response times decrease to 41ms.

Summary

These are basic but reliable approaches to help assess the performance of particular workloads.

Sirveo is designed to handle concurrent workloads, and will take advantage of multiple CPU cores, if available. The general consideration for compute requirements and utilization are;

CPU usage scales with the complexity and number of nodes in flows
Memory usage scales with the amount of data being processed within flows
Both CPU & memory usage scales with the number of flows running concurrently

These directly translates into three primary questions for sizing system resources for a Sirveo deployment:

How much peak concurrency is the deployment expected to handle?
How much peak throughput (requests per second) should a deployment support?
If external requests (webhooks, links) are triggering flows, at what point will bursty inbound loads overwhelm my system?

Need Help?

For expert assistance with sizing for your deployment scenario, get in touch.