Value and Reference

Whatever you do in the .NET framework deals either with value or reference types, yet, there seems to be a great deal of confusion in many discussions with fellow developers and on online forums and QA sites about where the actual variables reside. It is so basic yet a cause of so many misconceptions. For example one of them is that value types reside on the stack and that the reference objects reside on the heap. We will try to break up some of those misunderstandings by carefully examining and explaining what really happens(with the current implementation of the .NET runtime, which at the time of writing is .NET 4.5.1)

Before we dig deeper into this issue I just want to say that this is by no means a comprehensive guide to how types are handled in the .NET framework. It would take a whole book on that. I’m simply trying to create a nice picture and get a few things clear as a general concept by working the foundations and trying to create a picture of what is one possibility of what happens behind the scenes down at the deepest level.

Breaking Up the Process

When we talk about how an operating system works we inevitably come to the point of some fundamental abstractions. That is, in order to perform it’s features of providing an uniform mechanism for manipulating wide range of hardware the operating system needs to abstract. Let’s get a few things out of the way first, such as proper definitions of what a process and address space actually are.

The process is the most fundamental abstraction and it presents the running program. Multiple programs can run concurrently on the same machine but to us it must look like it is sequential.

Virtual Memory is an operating system abstraction that provides each process with the illusion that it has exclusive use of the main memory. So when we say process memory, we’re actually talking about addressable memory space. On the desktop, we had a 4GB address space and each application would get a 2GB area by default.

In .NET running on windows there is another layer on top of this, but when using JIT, the .NET framework effectively has to use the windows virtual address space model.

The Process’ Virtual Address Space

Without taking it up in too much detail and nasty specifics. Conceptually it should look like in Image 1.

virtual_address_space15
Image 1. Conceptual image of the virtual address space

From what we can see in the image, the process has an area that it allocates for the code(including resources etc.), then the stack and the heap and some other areas that it uses for storing shared resources such as dlls and OS specific dlls. The heap expands and contracts dynamically at runtime, as well as the stack. There is one more thing about the stack, where as the heap is only there for the sake of space for the objects(pretty much), the stack is used by the compiler for implementation of function calls. That said, the stack contracts and expands as the user code performs function calls at runtime. As we will see later, this particular difference is the exact point where the general misconception happens that value and reference types differ.

The Program Stack

A procedure/function (or method if we want to be more general) call involves passing both data (in the form of procedure parameters and return values) and control from one part of a program to another. In addition, it must allocate space for the local variables of the procedure on entry and deallocate them on exit. Most machines, provide simple instructions for transferring control to and from procedures. The passing of data and the allocation and deallocation of local variables is handled by manipulating the program stack.

This particular part is very interesting because it is a sort of overlap between the OS abstraction and the actual hardware itself. So, if we for a moment go down to the processor level we can say that programs make use of the program stack to support procedure calls. The machine uses the stack to pass procedure arguments, to store return information, to save registers for later restoration, and for local storage.The part of the stack allocated for a single procedure call is called a stack frame. You can see an approximation of how that looks in Image 2.

stack_frame6
Image 2. Stack frame structure

The topmost stack frame is delimited by two pointers, with register (ebp) serving as the frame pointer, and register (esp) serving as the stack pointer. The stack pointer can move while the procedure is executing, and hence most information is accessed relative to the frame pointer. The stack grows toward lower addresses and shrinks toward higher addresses and the stack pointer (esp) points to the top element of the stack.

We will go into more detail when we examine what happens with the actual value and reference types in the stack and the heap.

The .NET Framework and It’s Types

Having all this in mind on the lower level of the systems, let’s revisit the .NET part of things and try to tie things together.

We have two kind of behavior in .NET. Value behavior and reference behavior. The distinction between the two is in their concept. Value types represent the value itself, the actual data and references are memory locations. Memory locations that are addresses in the actual object instances created on the heap. They represent a sort of a link to the actual object in the virtual address space(I deliberately use that term instead of memory to illustrate the previous abstraction that we made) Image 3 illustrates this statement.

value_reference5
Image 3. Value and reference

Most common representative of the reference type is the a class,(there are also intefaces, delegates) and (beside the built in value types int, decimal etc.) the most exploited representative of the value type is the struct (also enums are value types).

We can think of the Image 3 in terms of variables. Consider the following code:

   Variable x1 = new Variable();  
   Variable x2 = x1;  

Since variable can be either a class or a struct, in case it is a struct it will copy the value of Variable x1 to x2. So on the image there would be two exact same copies of the same variable. But, in case it is a reference it will only copy the reference to the same object in memory (that is stored on the heap).

Now we get to the point where we make the most important statement of all (having in mind everything we said so far). That statement is that the actual values of the variables are stored wherever they are declared.

What do we mean by this? Let’s look at some examples.

Value and Reference Type In a Method

It is much more clearer to discuss this when we have code. So take a look at the example below.

 
   struct AValue  
   {  
     public int SomeIntInAValue;  
     public AReference refInAValue;  
   
     public AValue(AReference aReference)  
     {  
       refInAValue = aReference;  
       SomeIntInAValue = 0;  
     }  
   }  
   
   class AReference  
   {  
     public int SomeIntInAReference;  
     public decimal SomeDecimalInAReference;  
     public string SomeStringInAReference;  
   }  

and now imagine we have a method that looks something like this:

  public int AMethodWithReferenceAndValueType()  
  {  
    AReference ref1 = new AReference();  
    AValue val1 = new AValue(new AReference());   
   
     // Do something with them and return 1  
     return 1;  
  }  

The method AMethodWithReferenceAndValueType has two variables of each kind, a structure and class. The JIT will put the return address on the stack, save the state of the registers etc, and then it will create the local variables, in this case it will be the reference to AReference, an address to the new AReference object that is created on the heap once the AReference constructor was called. That is, the compiler took the blueprint (the type) for the AReference type and created an object from it on the heap. But, the actual reference to it is on the stack. Furthermore, the AReference class has several members (I know, public properties…I deserve to be punished and tortured for that) those are int, decimal and string. String is an object, i.e. a reference type so it is clear that it will be on the stack(if it had a value with = “Something”). What about the int and the decimal….well…because those are part of the reference object that resides on the heap, those will reside on the heap as well, as will the reference of SomeStringInAReference be.

Note that many types (such as string) appear in some ways to be value types, but in fact are reference types. These are known as immutable types. This means that once an instance has been constructed, it can’t be changed. This allows a reference type to act similarly to a value type in some ways – in particular, if you hold a reference to an immutable object, you can feel comfortable in returning it from a method or passing it to another method, safe in the knowledge that it won’t be changed behind your back. This is why, for instance, the string.Replace doesn’t change the string it is called on, but returns a new instance with the new string data in – if the original string were changed, any other variables holding a reference to the string would see the change, which is very rarely what is desired.

The case of the struct variable is different. The variable itself is the value. It’s size will be the sum of sizes of all the members combined together, and that’s what the runtime will put on the stack. it could look something like, int is 4 bytes and a reference is also I think 4 or 8 bytes depending on the architecture and the machine you’re using. Of course this is just a rough estimation, but you get the picture. The actual type instance that is contained in the struct, i.e. that the ref1 reference is pointing to will be on the heap.

Value and Reference Types as Parameters of a Method

When it comes to the issue of passing the two types as parameters in a method, we have, by default, that all the parameters are value parameters.

What does that mean for each type?

First for the value type the behavior is straightforward. New memory location is created on the stack for the function argument and the value of the parameter is copied into this location. That’s it, the value of int is copied to another int. Take a look at the following method.

  public int AMethodWithValueTypeParameter(int val)  
  {  
     Console.WriteLine("{0} is the value of val in AMethodWithValueTypeParameter");  
   
     val = 12345;  

     Console.WriteLine("{0} is the value of val in AMethodWithValueTypeParameter after it has been changed");
   
     // Do something with them and return 1  
     return 1;  
  }  

Now suppose we call it with the following code from somewhere:

  int val = 54321;  
  AMethodWithValueTypeParameter(val);  
   
  Console.WriteLine("{0} is the value of val after AMethodWithValueTypeParameter is run.");  

The output will be:

 54321 is the value of val in AMethodWithValueTypeParameter.  
 12345 is the value of val in AMethodWithValueTypeParameter after it has been changed.  
 54321 is the value of val after AMethodWithValueTypeParameter is run.  

This means the value has been lost after the method has popped the return value from the stack. The only thing you have left is the actual val variable that you used when you called the AMethodWithValueTypeParameter. So, it made a copy, used it in the method and forgot about it when it exited the scope of the function. Pretty straight forward.

What about references?

Let’s suppose now we have:

  private int AMethodWithReferenceTypeParameter(AReference ref1)  
  {  
     ref1.SomeIntInAReference = 11111;  
   
     // Do something with them and return 1  
     return 1;  
  }  

and we call it with something like:

  AReference ref1 = new AReference { 
     SomeDecimalInAReference = 12.5m, SomeIntInAReference = 12345, SomeStringInAReference = "A String"
  };  
  AMethodWithReferenceTypeParameter(ref1);  

If we print the value of SomeIntInAReference we will see that it changed the value. This is what could lead us to conclusion that the entire object has been passed and changed in the method. Well, it’s not true, the objects are never passed in C#, only the references. The reason that the value changed in the method is because the . (dot) operator is used to manipulate the values of the object that the reference is pointing to. Again, when we enter the method AMethodWithReferenceTypeParameter only the reference value is stored on the stack. Then when it comes across the .operator and the SomeIntInAReference it calculates where on the heap it should go and changes the value at that address.

The fact that the only the reference is passed by value can be illustrated if we do something like:

  private int AMethodWithReferenceTypeParameter(AReference ref1)  
  {  
     ref1 = null;  
   
     // Do something with them and return 1  
     return 1;  
  }  

Contrary to the conclusion that we might have from the previous example in this example the value of the reference variable ref1 does not change after it returns from the method. It still points to AReference instance.

Conclusion

So, bottom line:

  • The operating system is using different types of abstractions such as virtual memory address space so that we think that the entire memory is available to us as a long list of bytes.
  • The passing of data and the allocation and deallocation of local variables is handled by manipulating the program stack.
  • The value of the reference variable is a reference(address in memory), not the actual object itself.
  • The value of the value type variable is the data itself.
  • References themselves can be either on the stack or the heap, depending whether they are in a method as local variables or arguments, or the are part of another object. Reference objects are always on the heap.
  • Value types are on the stack or the heap depending on context as well.
  • All parameters are value parameters. The references are passed by value, the object they are referencing is a different story (but it doesn’t get passed). The value types are passed by value as well.

Well, that’s all folks, at least for now. I hope this helps somebody out there. Like I said it’s a kind of conceptual overview and by no means a comprehensive guide to types in .NET and C#. If you like the subject I encourage you to read further on the subject. Check out Jon Skeet and his book C# in Depth, check out Eric Lippert’s blog and also this wonderful book: Computer Systems: A Programmer’s Perspective by Randal E. Bryant and David R. O’Hallaron.

Happy coding.

//Bojan

Advertisements
Value and Reference

One thought on “Value and Reference

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s