Improve the performance with asynchronous functions to run processes in parallel

Disclaimer: the methods and code examples in this article are the fruit of my own investigation and self learning, and in no means these are ready to be used in a real, production environment. Use at your own risk. To learn more about parallel and synchronous programming, I recommend reading the official documentation at: https://docs.microsoft.com/dotnet/standard/parallel-processing-and-concurrency

This post is part of a series of 3 articles about performace improvement with parallel programming:

1. Improve the performance with asynchronous functions to run processes in parallel (this article)
2. Make use of ConcurrentBag to store the results from asynchronous processes
3. Parallel programming in C# with class Parallel

Recently I was collaborating with a colleague at work because the running time of a process was far slower than expected. Reviewing the code, we identified that parts of the process could be done in parallel, with which the overall performance was improved by a 200%. In this post, I'll show some basic techniques to work with processes parallelization and will display the performance improvements that can be achieved.

Starting point

I am using a simple Console application in C# to showcase an elementary case, to see how we can improve the performance using parallel computing techiques. Sample source code can be found here: https://github.com/sgisbert/parallelization

The starting code is as follows (Ver en Github):

using System.Threading;
using System.Diagnostics;
using System;

namespace parallel
{
    class Program
    {
        static void Main(string[] args)
        {
            Stopwatch timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < 10; i++)
            {
                Process(i);
            }

            Console.WriteLine($"Completed: {timer.Elapsed}");
        }

        private static void Process(int id)
        {
            Stopwatch timer = new Stopwatch();
            timer.Start();
            Thread.Sleep(200);
            Console.WriteLine($"Process {id}: {timer.Elapsed}");
        }
    }
}

We have a process that takes 200ms to complete, and we call it 10 times in a row with a for loop. As expected, this code execution takes 2 seconds to complete, as each process has to wait for the previous one to finish before it is started:

Process 0: 00:00:00.2018134
Process 1: 00:00:00.2003564
Process 2: 00:00:00.2003647
Process 3: 00:00:00.2007673
Process 4: 00:00:00.2008702
Process 5: 00:00:00.2004851
Process 6: 00:00:00.2003682
Process 7: 00:00:00.2009726
Process 8: 00:00:00.2009657
Process 9: 00:00:00.2006482
Completed: 00:00:02.0362772

Converting the method into asynchronous

The goal is that these 10 processes could be executed asynchronously and parallelized, so they don't need to wait for the others to start. First step would be converting the Process() method into an asynchronous function, so we are doing the following changes (see full file on Github):

Change method signature:

private static async Task Process(int id)

Change the call:

await Process(i);

Change the Main method signature, so it is asynchronous as well:

static async Task Main(string[] args)

With these changes, we have our asynchronous method, but, ¿is the performance better now? If we run the application again, we are getting exactly the same results as at the beginning, this is, we still have the same problem.

This is due to the use of the await keyword, with which we are literally saying "wait until it finishes to continue". This is, we are executing an asynchronous method in a synchronous way, and this is not what we want.

Executing the process asynchronously

To run the process in an asynchronous way, we need to call the function without the await keyword. In this case, the method will return a Task object, instead of the function return value (althoug it was void in this sample, it could be any data type), and it will continue the execution without waiting for the process to complete. But still, we need to know when the overall execution of all the processes completes to continue with the main thread execution. For that, we are making the following changes (see full file on Github):

We declare a list of Task where we'll keep track of all the Task being created as the processes are started:

List<Task> tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
     tasks.Add(Process(i));
}
Task.WaitAll(tasks.ToArray());

We wait for all the tasks to complete with Task.WaitAll()

With these changes, we can run the program again and, surprise! We still have the same poor performance as at the start... What happened?

This is because the compiler needs the combination of both async/await keywords in a method to start a new execution thread. In this simple case, the Process() method is declared as async, but it is not making use of any await call, so the compiler will just treat it as a synchronous method. We will even be getting a warning from the compiler at editing time:

If as part of the method we would make use of any other asynchronous calls, like reading some data from a DB with EF Core, then we would get a new execution thread and the following step wouldn't be necessary.

Executing the processes in parallel

To make sure we are getting a new thread for each process, we are modifying the code like this (see full file on Github):

private static async Task Process(int id)
{
    await Task.Run(() =>
    {
        Stopwatch timer = new Stopwatch();
        timer.Start();
        Thread.Sleep(200);
        Console.WriteLine($"Process {id}: {timer.Elapsed}");
    });
}

This way, we use await to create a new thread with Task.Run().

When we run the application again, these are the results:

Process 2: 00:00:00.2016508
Process 3: 00:00:00.2010563
Process 1: 00:00:00.2016712
Process 0: 00:00:00.2053744
Process 7: 00:00:00.2008342
Process 6: 00:00:00.2008357
Process 5: 00:00:00.2008776
Process 4: 00:00:00.2009419
Process 8: 00:00:00.2091808
Process 9: 00:00:00.2091704
Completed: 00:00:00.6359204

Finally we can appreciate a notorious performance increase, as we moved from the original 2 seconds down to 0,6 seconds with the parallelized execution. Note as well that the execution order is randomly displayed, as the process do not rely any more on the starting order, but the time it takes to each one to complete.

Conclusions

Using this fairly simple parallelization technique, we can dramatically increase the performance in our applications.
We need to understand correctly how async/await works, because, as demostrated, simply converting a method into async and invoking it with await is not enough. You have to use Task.WaitAll() for this.
Generally speaking, when you find some processes that do not rely on each other, you may be able to parallelize them, as for example, make a DB query and loading some data from an XML file, or making a request to an external API.
Tip: if you see a bunch of await calls in a row, think if they can be parallelized.
If a process needs the outcome of another process as an input, these cannot be parallelized.
Not all cases can be parallelized, like for instance, using EF Core to access a DB will not allow to make two simultaneous requests using the same context.
However, working with the file system is a great candidate to process files in parallel and getting some performance gains.

In the following blog post I explain how to use Thread-safe data types to interact with them from different threads, as for instance, make some parallel calculations and get all together back into the same List.