multithreading Multiprocessing vs Threading in Python

شيماء المالحيناير 18, 2023

55 5 دقائق

Running this Python multithreading example script on the same machine used earlier results in a download time of 4.1 seconds! While this is much faster, it is worth mentioning that only one thread was executing at a time throughout this process due to the GIL. The reason it is still faster is because this is an IO bound task.

However, at the point the 1st thread is run the network IO will be requested by the thread. As this IO is performed outside of Python the GIL would release the lock, and allow the other thread to run. When the first thread’s IO returned the lock would then be reacquired. Therefore, threading has provided us with some additional benefit without requiring the overhead needed in creating multiple processes.

Now that we have an idea of how the code implementing parallelization looks like, let’s get back to the performance issues. As we’ve noted before, threading is not suitable for CPU bound tasks; in those cases it ends up being a bottleneck. GUI programs use threading all the time to make applications responsive. For example, in a text editing program, one thread can take care of recording the user inputs, another can be responsible for displaying the text, a third can do spell-checking, and so on.

However, your choice should be highly dependent on the type of task you need to perform. As mentioned above, CPU-bound tasks are executed well with a multiprocessing modules, whereas multithreading works well for IO-bound tasks. From the diagram above, we can see that in multithreading (middle diagram), multiple threads share the same code, data and files but run on a different register and stack. Multiprocessing (the right diagram) multiplies a single processor, replicating the code, data and files, which incurs more overhead. In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This PEP proposes to add a new module, interpreters, to support
inspecting, creating, and running code in multiple interpreters in the
current process.

InterpreterPoolExecutor

It’s also interesting to notice that the individual threads (particularly selenium threads) run faster in serial than in parallel, which is the typical bandwidth vs. latency tradeoff. Most if not all data science projects will see a massive increase in speed with parallel computing. In fact, many of the popular data science libraries already have parallelism built into them, you just have to enable it. So before trying to implement it on your own, look through the documentation of the library you’re using and check if it supports parallelism (by the way, I definitively recommend you to check out dask). In case it doesn’t, this article should assist you in implementing it on your own.

This allows the CPU to start the process and pass it off to the operating system (kernel) to do the waiting and free it up to execute in another application thread.
The module will also provide a basic
Queue class for communication between interpreters.
You can connect with Sumit on Twitter, LinkedIn, Github, and his website.
To demonstrate concurrency in Python, we will write a small script to download the top popular images from Imgur.

You may notice that multiprocessing might lead to higher CPU utilization due to multiple CPU cores being used by the program, which is expected. Multithreading and multiprocessing are two ways to achieve multitasking (think distributed computing) in Python. Many developers that are first-timers to concurrency in Python will end up using processing.Process and threading.Thread. However, these are the low-level APIs which have been merged together by the high-level API provided by the concurrent.futures module.

Scenario: Classification Using Scikit-Learn

Multiprocessing is for times when you really do want more than one thing to be done at any given time. Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. By putting each job in a Multiprocessing process, each can run on its own CPU and run at full efficiency.

Not the answer you’re looking for? Browse other questions tagged pythonmultithreadingmultiprocessing or ask your own question.

Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores. This means that although we may have multiple threads in our program, only one thread can execute at python multiprocessing vs threading a time. Both the threading module and the multiprocessing module support the same concurrency primitives. Sharing data between processes is more challenging than between threads, although the multiprocessing API does provide a lot of useful tools.

Exploring Parallel Processing Libraries in Python: A Comprehensive Guide

Queue implements all the methods of queue.Queue except for
task_done() and join(), hence it is similar to
asyncio.Queue and multiprocessing.Queue. Interpreters.Queue objects act as proxies for the underlying
cross-interpreter-safe queues exposed by the interpreters module. Each Queue object represents the queue with the corresponding
unique ID. To set a value in the interpreter’s __main__.__dict__, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead.

That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it. Any of those could preserve the traceback, which is useful for
debugging. The module will include a basic mechanism for communicating between
interpreters. Without one, multiple interpreters are a much less
useful feature. Any remaining subinterpreters are themselves finalized later,
but at that point they aren’t current in any OS threads.

Each process is, in fact, one instance of the Python interpreter that executes Python instructions (Python byte-code), which is a slightly lower level than the code you type into your Python program. A function can be run in a new thread by creating an instance of the threading.Thread class and specifying the function to run via the “target” argument. The Thread.start() function can then be called which will execute the target function in a new thread of execution. I am runing these codes using Python 3.6 on a Windows PC with 8 cores.

However, the threading module comes in handy when you want a little more processing power. The following examples demonstrate practical cases where multiple
interpreters may be useful. Both Interpreter.prepare_main() and Queue work only with
“shareable” objects.

Multithreading vs. Multiprocessing in Python

Before we dive into writing the code, we have to decide between threading and multiprocessing. As you’ve learned so far, threads are the best option when it comes to tasks that have some IO as the bottleneck. The task at hand obviously belongs to this category, as it is accessing an IMAP server over the internet.

There is no consistency and predictability on when threads will be interleaved. However, asyncio is synchronous as long as you are not awaiting on something. Event loop will keep executing until there is an await You can clearly see where coroutines are interleaved. Event loop will kick out a coroutine when the coroutine is awaiting. Multiprocessing
Each process has its own Python interpreter and can run on a separate core of a processor. Python multiprocessing is a package that supports spawning processes using an API similar to the threading module.