AI, Machine Learning, DevOps

Testing Gunicorn WSGI + Meinheld for serving Flask API

My very first post will be dedicated to choosing a very optimized WSGI (Web Server Gateway Interface) testing. Many of our backend microservice apps are written in python Flask. Nowadays there’s a lot of different WSGI modules to run such apps. I’ve found this article as a good general overview.

From my personal experience: I’ve been using mod_wsgi from Apache for a long while. However, it’s a very painful experience to use it when your python version ≥ 3.7. The module libapache2-mod-wsgi-py3 needs to be built against python 3.7 or higher. However right now the latest version of python it supports is 3.5.This is why most of the teams have moved to standalone WSGI servers like uWSGI and gunicorn. I’ve been managing to solve this issue by using mod_wsgi-express but it has quite some limitations.

Gunicorn is a very popular WSGI nowadays proven to be a good, consistent performer for medium loads. Meinheld is an asynchronous web server like Gevent (Meinheld makes your code to switch between requests on I/O), however it showed to be faster than gevent. Also according to the documentation, Meinheld can be easily used together with Gunicorn, what makes this combination very powerful to handle a large amount of traffic.

In this post I will share our results of benchmark testing with a very simple Flask app running on sole Gunicorn vs Gunicorn+Meinheld with different amount of workers. The tests were made as on a simple VM instance (1 vCPU, 3.75 GB memory) as well as on a Google Cloud Run node with the configuration mentioned later.


Testing setup

The Flask app consists of one single page returning the Python version

According to the Meinheld documentation, starting Gunicorn served by Meinheld is pretty straight forward:

exec gunicorn **\-k egg:meinheld#gunicorn\_worker** -c "$GUNICORN\_CONF" "$APP\_MODULE"

In addition, workers_per_core must be indicated in the Gunicorn config file. You can find my example here.


Testing results

For our testing we used wrk. I’ve found it as a very intuitive HTTP benchmarking tool and it is able to generate quite significant load even on small machines. Another option would be using ab benchmarking tool from Apache. It is also very intuitive, however in our machine with 16 vCPUs it was already failing after surpassing 1000 concurrent connections.

This is how a typical test command would look using wrk with 1 thread and 800 concurrent connections.

wrk -d 20s -t 1 -c 800 \[_URL_\]

These are our results when running on VM instance with 1 vCPU in Google.

Marked yellow: Meinheld served by Gunicorn (4 workers), Marked blue: Gunicorn WSGI


Testing results with Google Cloud Run

When creating a Google Cloud Run node (1 CPU, 512 MiB), there are few settings that can be adjusted.

When having an instance pool (like Google Cloud Run, Kubernetes), or other similar complex system to manage distributed containers on multiple machines, it’s possible to handle replication at the cluster level instead using a process manager in each container that starts multiple worker processes.

However, ideally we would like to avoid autoscaling whenever possible. Autoscaling is better to be used when the efficiency falls instead of simple spawning many nodes.

And according to our tests, using Meinheld helps with the efficiency even in case of instance replication.

During our tests we used Cloud Run node with 1 CPU. We were tweaking maximum number of requests per container and comparing Gunicorn vs Meinheld served by Gunicorn with 1/3/4 workers. We also tried autoscaling with 1 and 3 maximum numbers of instances.

The main goal is to have a possible max number of requests in instances for their better efficiency. This max number should allow to processes the requests with as little delay as possible.

It clearly shows how the efficiency improvement with Meinheld and higher number of workers.

Then we were increasing the number of autoscale instances to 3. The difference between 1 and 4 workers nearly disappeared. However, with pure Gunicorn without Meinheld the average latency was around 70% bigger.

Reducing the number of the sent concurrent requests in testing still proved that Meinheld improves efficiency even in cases when there’s no an abundance of traffic.

Reducing and increasing the number of concurrent requests to the node, eventually we made a conclusion what would be the best Google Cloud Run configuration with 1 СPU in our case. It showed that the node is able to handle around 1,1k requests per second. Finally, we setup Google Cloud Run node with 1000 max requests and 3 Meinheld workers.

Here is my GitHub repository with setup files to run the Docker containers for testing.

https://github.com/iskandre/gunicorn-test

GitHub - iskandre/gunicorn-test

Contribute to iskandre/gunicorn-test development by creating an account on GitHub.

Feel free to share your thoughts, ideas or questions please.

Thank you.