Python Internal Working: How It Works Under the Hood

Python Internal Working: How It Works Under the Hood

Python, a language hailed for its simplicity and readability, often leaves learners curious about its inner workings. As you delve into the world of Python programming, understanding how Python executes your code internally can provide valuable insights into its efficiency and versatility.

Before we embark on our journey through Python's internal mechanisms, let's take a moment to distinguish between compiled and interpreted languages.

Compiler VS Interpreter:

In the realm of programming languages, compilers and interpreters play vital roles in converting high-level code into machine-executable instructions. While both serve the same purpose, they differ in their approach.

A compiler translates the entire source code into machine code before execution, creating standalone executable files. On the other hand, an interpreter converts code into machine instructions line-by-line during runtime, eliminating the need for a separate compilation step.

Understanding How Python Operates:

Your computer, equipped with memory and a processor, relies on machine language instructions for execution. However, Python code, written in a high-level language, is incomprehensible to the processor in its raw form. Here's where Python's compilation process comes into play.

When you write Python code and save it with a .py extension, Python's compiler swings into action. Let's break down the four key steps involved in this process:

  1. Tokenization: The source code undergoes tokenization, breaking it down into discrete units called tokens.

  2. Parsing: The stream of tokens is parsed into an Abstract Syntax Tree (AST), capturing the hierarchical structure of the code.

  3. Control Flow Graph (CFG): The AST is transformed into a Control Flow Graph, depicting the flow of control within the program.

  4. Bytecode Generation: Based on the CFG, bytecode is generated, representing low-level instructions that the Python Virtual Machine (PVM) can interpret.

Python Virtual Machine (PVM):

The bytecode generated by the compiler is not directly executable by the processor. Instead, it's handed over to the Python Virtual Machine (PVM) for interpretation. The PVM acts as an interpreter, converting bytecode into machine code, which the processor can execute.

Now, you might wonder why Python, often dubbed as an interpreted language, undergoes a compilation step similar to languages like Java. The distinction lies in the interpretation process itself. While Python does compile code into bytecode, this compilation is abstracted away, giving the impression of direct interpretation.

Now that we've grasped the basics of how Python compiles and interprets code, let's shine a light on the Python Virtual Machine (PVM) and its pivotal role in executing Python programs.

Think of the PVM as the engine that drives your Python code. Once the compiler generates bytecode, it's the PVM's responsibility to take that bytecode and translate it into machine code that your computer's processor can understand and execute.

Here's how the PVM operates:

  1. Bytecode Interpretation: Upon receiving bytecode from the compiler, the PVM's interpreter springs into action. It meticulously traverses each bytecode instruction, deciphering its meaning and translating it into machine-executable instructions.

  2. Dynamic Typing: One of Python's defining features is its dynamic typing. Unlike statically typed languages like C++ or Java, Python variables are not bound to a specific data type until runtime. This means that the PVM must handle dynamic typing efficiently, dynamically allocating memory and adjusting data types as needed.

  3. Memory Management: Python's automatic memory management, facilitated by the PVM, relieves developers of the burden of manual memory allocation and deallocation. The PVM employs techniques like reference counting and garbage collection to ensure efficient memory usage and prevent memory leaks.

  4. Platform Independence: Thanks to the PVM's abstraction layer, Python code runs seamlessly across different platforms without the need for recompilation. Whether you're on Windows, macOS, or Linux, the PVM ensures consistent behavior and performance.

  5. Optimizations: Modern Python implementations, such as CPython, employ various optimizations to enhance performance. These optimizations range from bytecode caching to Just-In-Time (JIT) compilation, ensuring that Python code runs as efficiently as possible.

By bridging the gap between Python bytecode and machine instructions, the PVM empowers developers to write code that is both elegant and efficient. Its adaptability and versatility make it a cornerstone of Python's success as a programming language.

Python's Memory Management:

As we move further into the heart of Python's inner workings, it's essential to shed light on how Python manages memory behind the scenes. Understanding Python's memory model is crucial for writing efficient and robust code.

At the core of Python's memory management lies the Python memory allocator, responsible for allocating and deallocating memory dynamically. Here's a glimpse into how Python's memory management system operates:

  1. Object Allocation: Every object in Python, whether it's a variable, list, or function, is represented by a PyObject structure in memory. When you create a new object in Python, the memory allocator allocates memory for the object and initializes its attributes.

  2. Reference Counting: Python employs a reference counting mechanism to track the number of references to each object. Every time a new reference to an object is created, Python increments its reference count. Conversely, when a reference is deleted or goes out of scope, Python decrements the reference count. When an object's reference count drops to zero, Python deallocates the memory associated with the object.

  3. Garbage Collection: While reference counting efficiently handles most memory management tasks, it's not foolproof. Circular references, where objects reference each other in a loop, can lead to memory leaks if not managed properly. To address this, Python employs a garbage collection mechanism that identifies and collects cyclically referenced objects, freeing up memory for reuse.

  4. Memory Pooling: To improve memory allocation performance, Python utilizes memory pooling techniques. Instead of requesting memory from the operating system for each new object, Python maintains a pool of pre-allocated memory blocks of varying sizes. When you create a new object, Python retrieves memory from these pre-allocated blocks, reducing overhead and fragmentation.

  5. Memory Optimization: Python's memory management system is continually evolving to optimize performance and reduce memory overhead. Recent versions of Python introduce optimizations such as compacting small object arenas and reducing memory fragmentation, resulting in improved memory utilization and reduced memory footprint.

By understanding Python's memory management intricacies, developers can write more efficient and scalable code, minimizing memory leaks and maximizing performance.

The example of how python Memory Management Work:

String “Hello” is stored in the Heap, and in the stack a reference is created to that object. In above diagram both variable ss and s have the same memory address in the stack as both are referring to the same object. Same thing applies for number too.

So when s = “good bye” the value in the heap will not be replaced, thus ss referring to that will remain the same but what happens is that a new value is created in the heap and s refers to the address of the new value created in the heap.When variable ss also changes its reference there is something called as reference count that each object contains, it counts the number of variables referring that particular object, when reference count is less than 1 the object deletes itself from the memory by the garbage collector which deallocates the object in heap which no longer have references to them.

For better understanding use memory dump tools to extract and analyse data from RAM.

This is it, For how python works intenally

Thank you😊.