Combining asynchronous with parallel: Boosting both I/O bound and CPU bound tasks
From the performance perspective, there can be two types of issues or bottlenecks. One of them occurs when there are too many inputs or outputs requested from the application, the other one appears when the application does some long-running calculations. When, for example, there are hundreds of inputs and some of them require a few seconds to process, the simple solution would be to put them in a queue and process them one by one. This is probably okay for solving mathematical tasks, but useless for web servers or real-time tasks. Imagine, what would happen if each user had to wait for a few minutes before accessing the website, that shows them some data. They would probably get a bit annoyed and some of them would leave the application completely.
However, the coding is much more advanced today than it was decades ago, and there are interesting concepts on how to boost both the input or I/O bound issues and the calculation heavy CPU bound issues. The concept that helps to solve the issue of I/O bottleneck is called asynchronous programming, while the approach that helps to solve CPU bound issues and possible lags that come with them is called parallel programming. Both approaches assume that we can justly distribute the resources we have across all inputs or incoming events. The asynchronous approach is based on a so-called asynchronous loop, that performs a piece of code processing one input, then switches to another piece of code processing a second input, and then back to processing the first input and continuing in a subsequent piece of code.
The code is divided into sections or pieces using await statements. The programmer needs to make sure the code between two subsequent await statements does not take much time, otherwise the I/O bound bottleneck would be replaced by a CPU bound bottleneck. To put it simply, your code needs to be divided into a set of functions (in Python defined using def), in asynchronous programming called coroutines (in Python defined by async def), that include the await statements. Let us say we have a web user request processing function that wants to get information about the user from the database. Normally, there would be some function call to get the data from the database, that would take some time and thus block the whole application as a side effect. However, if the function is replaced by a coroutine and the call is done via the await statement, the processing of the user request stops, and some other request from another user can be served in the meantime. When there is again an await, the loop goes back to the database return call related to the first user, exactly to the place where the first await was located.
Parallel programming, on the other hand, instead of using concepts such as the asynchronous loop simply utilizes other CPU threads and processes, handled by the operating system. It typically splits a computationally heavy task into subtasks, which are calculated on threads separately and whose subresults are then merged into the final result (map-reduce approach). The more scalable the task and the more CPU cores available, the faster the computation is. You can already see that the asynchronous and parallel concepts are quite different, suitable for different tasks. Unfortunately, there is a history of using parallel programming to handle I/O bound bottlenecks, which led to the loss of scalability when handling heavy real-time tasks that are typical not only in the area of web servers, but especially in cyber security and artificial intelligence.
Let us now take a look at the following Python code, that properly handles user requests using the asynchronous programming, while it leaves the computationally heavy task to parallel programming. Both concepts are connected via an executor that executes a piece of code (here written in parallel programming) on an asynchronous coroutine, thus making it look like an I/O bound task. Python in version 3 provides a library for asynchronous programming named asyncio and there is already a standard library for asynchronous web servers named aiohttp, thus the code is short and simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import aiohttp.web import asyncio import concurrent import time # Executor creates a thread pool (parallel programming) EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=10, thread_name_prefix="webserver_") # Create the asynchronous application APP = aiohttp.web.Application() # CPU-bound task def handle_task(request): print("This is some CPU heavy task.") time.sleep(1) return "slept for 1 second." # I/O bound task # Asynchronous coroutine called, when a user hits the "hello" endpoint async def hello(request): print("Here we are handling the user request. We can do some more work here.") # Switch to parallel programming loop = asyncio.get_event_loop() result = await loop.run_in_executor(EXECUTOR, handle_task, request) return aiohttp.web.Response(text="Hello, {}".format(result)) # Register the "hello" web server endpoint and run the application APP.add_routes([aiohttp.web.get('/hello', hello)]) aiohttp.web.run_app(APP) |
Please note, that the I/O bound task is a coroutine (async def), while the CPU bound task is a function (def). This simple check makes sure we split the application properly as described above. Now open two terminals. In one of them simply run the program:
1 2 3 |
$ python3 webserver.py ======== Running on http://0.0.0.0:8080 ======== (Press CTRL+C to quit) |
You can see that the web server is listening on localhost at port 8080/TCP. So in the other terminal you can send a HTTP request via CURL (or you can do it through your favorite web browser) to the hello endpoint, that is defined in the code above:
1 2 |
$ curl http://localhost:8080/hello Hello, slept for 1 second. |
You can send hundreds of HTTP requests at one time using some script, and you will see that all of them will be handled based on the parametrization of the executor. Of course, the CPU heavy task needs to be properly created in the handle_task function, the time function provided in the sample cannot be really optimized in the parallel world.
One thought on “Combining asynchronous with parallel: Boosting both I/O bound and CPU bound tasks”