Flevo CFD

Do You Think Python Can Run Two Tasks simultaneously?

In this post, I want to talk about parallel programming, but before that, I need to define a few terms.

Types of tasks

Sync

A sync task is dependent on another task, meaning it needs to synchronize with that task and wait for it to finish before it starts its own work. For example, consider the task of moving to a room. it consists of two tasks of opening the door and going into the room. if you don’t open the door you can’t get into it (there’re other ways but just take it as an example).

Async

Some tasks are not dependent on another task and can perform their tasks without needing them. For example, drinking water and going to school. they are two tasks that do not necessariliy depend on each other. you could drink water wether you go to school or not and you could go to school wether you drink water or not.

These two tasks can run independently of each other, and there’s no need for them to be synchronous.

Whether a task is synchronous or asynchronous depends on the nature and structure of that task and the operations it performs, nothing else.

Now, if your tasks are capable of running independently, meaning they are asynchronous, they have the capability to execute concurrently.

Concurrency v Parallelism

Concurrency

Concurrency is the model of executing multiple tasks simultaneously, meaning we structure the program in a way that allows asynchronous tasks to run concurrently. Note that I said they have the capability to run concurrently, not necessarily that they will run simultaneously. Pay attention here.

Parallelism

Parallelization is the next step, meaning tasks that have been structured concurrently are now executed in parallel and simultaneously.

So, Concurrency pertains to the structure of the program, while Parallelism relates to its execution.

For example, consider a scenario where only one connection to a database can be active at a time, and suppose there’s a task that connects to the database and reads some data.

Now, if we structure the program in such a way that four instances of these tasks can run independently, we’ve made this program Concurrent. But due to the limitation (for the sake of the example) of hardware, only one task can be connected to the database at any given moment. In other words, practically only one task can do its job at a time. Now, if we increase the number of allowed connections to three (through hardware upgrades), then three tasks can run simultaneously, which would be Parallelism.

Another example: consider the task of buying tickets (assuming there’s no limit on the number of available tickets). If we have one ticket counter and form four queues for ticket purchases, we’ve structured the system to be Concurrent. However, at any given moment, only one person can buy a ticket.

If we increase the number of ticket counters to three, now ticket sales are happening simultaneously, and at any given moment, three people are buying tickets concurrently.

Parallel Programming in Python

For parallel programming and concurrent processing in Python, we have several prominent modules

  1. multithreading
  2. multiprocessing
  3. concurrent.futures
  4. asyncio
  5. gevent

Let me define Thread and Process

Thread: A thread is a series of instructions executed by a process and is dependent on that process. Creating threads is low-cost.

Process: Unlike threads, processes are entirely independent and incur a significant overhead in their creation.

The multithreading module utilizes threads, while the multiprocessing module utilizes processes for task execution.

concurrent.futures is a newer and integrated version of the multithreading and multiprocessing modules introduced in Python 3, where you can learn more about it here.

gevent employs a special type of thread called Green Thread, which is very efficient for tasks requiring I/O operations.

asyncio uses a special model for executing multiple tasks, introduced in Python 3.6.

Here, we’re only focusing on the first two modules.

In Python, only using processes allows for truly parallel execution of multiple tasks.

Why can’t we achieve true parallelism with threads? The reason is the existence of the Global Interpreter Lock (GIL).

What is GIL?

GIL (Global Interpreter Lock) is a mechanism enforced by the Python interpreter to ensure that only one thread can execute at a time, effectively allowing only one thread access to the interpreter’s memory at any given moment.

This mechanism was introduced to prevent two threads from accessing the same portion of memory simultaneously, ensuring that the program operates correctly. Other programming languages like C++, Java, and Golang don’t have this mechanism and use alternative methods to handle simultaneous memory access.

The reason Python employs GIL instead of another method is that Python is an older language, and back when it was designed, all CPUs were single-core.

Now, let’s discuss the scheduling model of tasks:

Types of Multitasking Management

Preemptive Multitasking

In this model, there’s a central component that manages tasks and allocates or deallocates control to them. It’s also known as forced multitasking. Python uses this model for managing threads. In this scenario, each thread executes a task.

Cooperative Multitasking

In this model, tasks voluntarily yield control to each other, and all tasks are executed with their collaboration. This yielding occurs either when tasks are waiting for I/O or periodically. The asyncio module utilizes this model for task execution. By default, asyncio runs all tasks using a single thread.

Control Switching in Threads

As mentioned, Python uses preemptive multitasking for managing threads, where thread control is in the hands of the operating system. If you run your program with 10 threads, for example, one thread executes initially. Once a certain threshold is reached – 100 bytes of Python code execution in Python 2 or 10 milliseconds in Python 3, or if the thread waits for an I/O operation – the operating system takes control away from that thread. Then, the operating system decides whether to return control to that thread or start another thread.

This way, tasks are neither executed sequentially nor truly simultaneously; they’re executed between these two states, giving the impression of concurrent execution.

A downside of using threads is this constant control switching that the operating system does. it’s suggested to play with the number of threads to find the sweet spot. if the program runs on many threads, it gets slower solely becasue of the overhead of control switching.

To decide whether to use threads or processes, a general rule of thumb is to use threads for tasks involving I/O and processes for computational tasks requiring CPU resources.

Comparison of Thread and Process Performance

As we mentioned, if your task involves I/O operations, such as fetching multiple web pages, it is better to use threads. However, if your task is computational, it’s better to use processes.

Consider the Fibonacci function, which is a computational operation

1
2
3
4
5
6
7
8
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)


for in range (2):
    fib(35)

If we execute this code the result would be

1
2
3
4
5
time python3 fib.py

real	0m4,415s
user	0m4,360s
sys     0m0,016s

Let’s first interpret time command output

real is the wall clock time. it shows from start to finish of the call.

user is the amount of CPU time spent in running user code.

sys is the amount of CPU time spend in kernel for things like allocating memory, creating processes and …

So, if we calculate the Fibonacci number of 35 sequentially two times, it takes about 4 seconds.

Now, let’s perform the same task with 2 threads.

1
2
3
4
5
6
7
from concurrent.futures import ThreadPoolExecutor, as_completed


with ThreadPoolExecutor(max_workers=2) as e:
    futures = [e.submit(fib, 33) for n in range(2)]
    for future in as_completed(futures):
        print(future.result())

The result would be

1
2
3
4
5
time python3 fib.py

real	0m4,921s
user	0m4,845s
sys     0m0,016s

The result isvpretty much the same.

Now, let’s execute the above code with 2 processes

1
2
3
4
5
6
7
from concurrent.futures import ProcessPoolExecutor, as_completed


with ProcessPoolExecutor(max_workers=2) as e:
    futures = [e.submit(fib, 33) for n in range(2)]
    for future in as_completed(futures):
        print(future.result())

The result would be

1
2
3
4
5
time python3 test10.py 

real	0m2,503s
user	0m4,848s
sys     0m0,025s

The CPU time is the amount of time spent in all CPU cores. the user time is around 2 seconds meaning that 2 seconds spent all CPU cores. the code was executed using two processes so 2 seconds in 2 CPU cores. so we benefited from multiple CPU cores.

Now, let’s try an I/O operation. Input/Output refers to operations that send/recieve data from a system.

In this test, we want to fetch 100 web pages using 8 threads and 8 processes. (my system has 8 CPU cores)

The sequential version of this program takes about several minutes.

Thread

1
2
3
4
5
6
7
from concurrent.futures import ThreadPoolExecutor
import requests 

urls = ['https://github.com?v=' + str(i) for i in range(200)]
with ThreadPoolExecutor(max_workers=10) as e:
    for url in urls:
        e.submit(requests.get, url)
1
2
3
4
5
time python3 io_thread.py

real	0m5,687s
user	0m27,060s
sys     0m0,397s

Process

1
2
3
4
5
6
7
from concurrent.futures import ProcessPoolExecutor
import requests 

urls = ['https://github.com?v=' + str(i) for i in range(200)]
with ProcessPoolExecutor(max_workers=8) as e:
    for url in urls:
        e.submit(requests.get, url)
1
2
3
4
5
time python3 io_process.py

real	0m5,106s
user	0m14,457s
sys     0m0,493s

The result is pretty much the same.

Conclusion

  • True parallelism in Python can only be achieved using processes.
  • Use threads for I/O-bound tasks and processes for CPU-bound tasks.
  • If you use threads, Python will only utilize one CPU core at the same time.
  • Always test your code using both thread and processes to find the optimal number of workers and worker type.
  • Increasing number of threads is not always the best option. the overhead of control swithcing that operating system does slows down the application.
  • If you need to execute a very large number (tens of thousands) of tasks simultaneously (and time is important to you), Python will not be a suitable solution for you.