Benchmarking
This guide is about measuring flow execution speed on Sirveo, and how to assess performance for specific configurations. The aim of this guide is primarily to help you answer two questions:
How much work can Sirveo do for a specific workload and deployment scenario?
How much compute do I need to support a particular throughput on a specific workload?
In the following sections, we’ll cover:
- Setting up Sirveo on a Linux cloud VM
- Setting up flows with and without external IO latency
- Generating concurrent, constant load via webhooks
- Assessing benchmark results
Test VM setup
For these benchmarks, I’ll run Sirveo on the following droplet:
I’m opting for 2 CPUs, since we’ll be installing PostgreSQL alongside Sirveo on the same VM. This minimizes latency between the server and the database.
While the VM provisions, I’ll get an evaluation license. Then some initial setup.
Then, a standard PostgreSQL installation, as per official docs.
Prepare a role and database in PostgreSQL.
Download & verify the latest stable Sirveo binary, and make it available in the system PATH
.
Create a system user to run the server as:
Bootstrap a default configuration file:
Configure essential settings:
The server will run via a systemd, using a unit file like this:
Create the unit file, reload systemd config, tail the journal (new terminal), and start the server.
Sirveo is now up and running.
I’ll be using bombardier to generate some HTTP load. Since this is a short-lived test environment, I’ll grab the binary directly, instead of building from source.
Using webhooks, it is easy to generate external load via HTTP requests.
Initial load test
For initial context, I want to establish an upper boundary for how fast the server can execute a flow. Performance is highly dependent on what the target flow is doing. So the best-case upper boundary needs a minimal flow without external IO.
-
Log in at
http://localhost:7001
-
Create a new graph flow, with only the default passthrough node
-
Activate the flow
-
Create a new webhook for the graph flow:
identifier=
bench01
output=Last
Sirveo is currently using about 24 MiB of reserved memory, and the VM is otherwise idling.
Now we’re ready to generate some HTTP traffic. I’m using the server’s inbound port, although the admin port will deliver similar results.
Results:
The server manages up to 3210 requests per second, with an average of 2440. Notably, it maintains sub-millisecond response times for 99% of the 10000 requests. Reserved memory increased to about 32 MiB during the test.
What exactly is being measured here?
How fast the server can execute a minimal graph flow, via inbound HTTP requests, one at a time.
With concurrency
Let’s introduce some concurrency by using 2 simultaneous connections.
On average, the server now reaches 3472 flow executions, and response times still looks good. Increasing concurrency will not improve throughput beyond this point, noting that bombardier itself is using about 20% of CPU capacity for coordinating HTTP requests.
This test only serves as an indication of an upper boundary of execution speed, with a flow that is practically doing no internal work. Further note that memory usage is low because there’s very little data moving around within the flow.
With external latency
Next, we need to understand what throughput and resource utilization looks like when external latency is introduced.
I’ll create a new graph flow.
The HTTP node makes a request to an external service which is not going to rate-limit my requests. In this case I’m making HTTP requests directly to another Sirveo server on a different VM. Latency to the external service is roughly 4ms, for about 8ms round-trip time on HTTP responses.
The JS Code node converts a response status code into a string value, which also adds some more CPU-bound work and more internal data objects:
The status node passes the HTTP response from the upstream server back to the webhook output. This is to confirm that the upstream is providing consistent 200 responses.
The HTTP request can account for about 94% of this flow’s execution time.
And lastly, I’ll create a new webhook (bench02
) that runs the new graph flow.
The aim of the measurement is now to find the highest average throughput, under constant load. Since the flow is spending some time waiting on responses, and I know that the upstream will deal with concurrency well, I can find the answer by gradually increasing concurrency on the benchmark.
Since this flow will be operating on more data internally, memory usage will become more aggressive as concurrency increases. It’s a good idea to constrain the server’s memory usage, which avoids the OS killing the server process. Configure a target memory limit in the unit file:
1 GiB of memory is more than enough for this workload. After a systemd daemon-reload
and a server restart, let’s test 10,000 requests with 2 connections:
About 1m30s into the test, htop reports a 1m load average of 1.06, or around 53% given 2 CPUs.
When the test completes:
On average, two connections achieves 269 requests per second, with peak throughput reported as 446 requests per second. Variability between the average and peak throughput is expected, since compute and network resources are available on a best-effort basis.
The average latency of around 8ms is in the expected ballpark, and the latency distribution looks reasonable. All 10,000 requests succeeded with a 200 status, which is coming from the upstream HTTP endpoint.
Observe what happens with 4 connections, which doubles the concurrent flow executions, and over a 2-minute period.
At 1m30s into the test, the 1m load average reaches 1.84, or approximately 92% of 2 CPUs.
Peak throughput increases by 45% to 647 requests per second, while, average throughput decreases to about 203 requests per second. While all 24,412 requests completed successfully, response times are beginning to deteriorate.
We’ll double concurrency again to see how throughput holds up.
Load average sits at 1.77, or 88%, which is similar to the previous test.
Average throughput is about the same at 195 requests per second. The increased concurrency improves peak throughput by a further 47%, reaching 955 requests per second, but average response times decrease to 41ms.
Summary
These are basic but reliable approaches to help assess the performance of particular workloads.
Sirveo is designed to handle concurrent workloads, and will take advantage of multiple CPU cores, if available. The general consideration for compute requirements and utilization are;
- CPU usage scales with the complexity and number of nodes in flows
- Memory usage scales with the amount of data being processed within flows
- Both CPU & memory usage scales with the number of flows running concurrently
These directly translates into three primary questions for sizing system resources for a Sirveo deployment:
- How much peak concurrency is the deployment expected to handle?
- How much peak throughput (requests per second) should a deployment support?
- If external requests (webhooks, links) are triggering flows, at what point will bursty inbound loads overwhelm my system?
Need Help?
For expert assistance with sizing for your deployment scenario, get in touch.