Sunday, April 5, 2009

Single-File and Multifile Assemblies

In a great number of cases, there is a simple one-to-one correspondence between a .NET assembly
and the binary file (*.dll or *.exe). Thus, if you are building a .NET *.dll, it is safe to consider that
the binary and the assembly are one and the same. Likewise, if you are building an executable desktop
application, the *.exe can simply be referred to as the assembly itself. As you’ll see in Chapter 11,
however, this is not completely accurate. Technically speaking, if an assembly is composed of a single
*.dll or *.exe module, you have a single-file assembly. Single-file assemblies contain all the necessary
CIL, metadata, and associated manifest in an autonomous, single, well-defined package.
Multifile assemblies, on the other hand, are composed of numerous .NET binaries, each of which
is termed a module. When building a multifile assembly, one of these modules (termed the primary
module) must contain the assembly manifest (and possibly CIL instructions and metadata for various
types). The other related modules contain a module level manifest, CIL, and type metadata. As you
might suspect, the primary module documents the set of required secondary modules within the
assembly manifest.
So, why would you choose to create a multifile assembly? When you partition an assembly into
discrete modules, you end up with a more flexible deployment option. For example, if a user is referencing a remote assembly that needs to be downloaded onto his or her machine, the runtime will only download the required modules. Therefore, you are free to construct your assembly in such a way that less frequently required types (such as a type named HardDriveReformatter) are kept in a separate stand-alone module.
In contrast, if all your types were placed in a single-file assembly, the end user may end up
downloading a large chunk of data that is not really needed (which is obviously a waste of time).
Thus, as you can see, an assembly is really a logical grouping of one or more related modules that
are intended to be initially deployed and versioned as a single unit.

The Role of the Common Intermediate Language
Now that you have a better feel for .NET assemblies, let’s examine the role of the common
intermediate language (CIL) in a bit more detail. CIL is a language that sits above any particular
platform-specific instruction set. Regardless of which .NET-aware language you choose, the
associated compiler emits CIL instructions. For example, the following C# code models a trivial
calculator. Don’t concern yourself with the exact syntax for now, but do notice the format of the
Add() method in the Calc class:
// Calc.cs
using System;
namespace CalculatorExample
{
// This class contains the app's entry point.
public class CalcApp
{
static void Main()
{
Calc c = new Calc();
int ans = c.Add(10, 84);
Console.WriteLine("10 + 84 is {0}.", ans);
// Wait for user to press the Enter key before shutting down.
Console.ReadLine();
}
}
// The C# calculator.
public class Calc
{
public int Add(int x, int y)
{ return x + y; }
}
}
Once the C# compiler (csc.exe) compiles this source code file, you end up with a single-file
*.exe assembly that contains a manifest, CIL instructions, and metadata describing each aspect of
the Calc and CalcApp classes. For example, if you were to open this assembly using ildasm.exe
(examined a little later in this chapter), you would find that the Add() method is represented using
CIL such as the following:
.method public hidebysig instance int32 Add(int32 x, int32 y) cil managed
{
// Code size 8 (0x8)
.maxstack 2
.locals init ([0] int32 CS$1$0000)
IL_0000: ldarg.1
IL_0001: ldarg.2
IL_0002: add
IL_0003: stloc.0
IL_0004: br.s IL_0006
IL_0006: ldloc.0
IL_0007: ret
} // end of method Calc::Add

Don’t worry if you are unable to make heads or tails of the resulting CIL for this method—
Chapter 15 will describe the basics of the CIL programming language. The point to concentrate on
is that the C# compiler emits CIL, not platform-specific instructions.
Now, recall that this is true of all .NET-aware compilers. To illustrate, assume you created this
same application using Visual Basic .NET (VB .NET), rather than C#:
' Calc.vb
Imports System
Namespace CalculatorExample
' A VB .NET 'Module' is a class that only contains
' static members.
Module CalcApp
Sub Main()
Dim ans As Integer
Dim c As New Calc
ans = c.Add(10, 84)
Console.WriteLine("10 + 84 is {0}.", ans)
Console.ReadLine()
End Sub
End Module
Class Calc
Public Function Add(ByVal x As Integer, ByVal y As Integer) As Integer
Return x + y
End Function
End Class
End Namespace

If you examine the CIL for the Add() method, you find similar instructions (slightly tweaked by
the VB .NET compiler):
.method public instance int32 Add(int32 x, int32 y) cil managed
{
// Code size 9 (0x9)
.maxstack 2
.locals init ([0] int32 Add)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: add.ovf
IL_0004: stloc.0
IL_0005: br.s IL_0007
IL_0007: ldloc.0
IL_0008: ret
} // end of method Calc::Add

Benefits of CIL
At this point, you might be wondering exactly what is gained by compiling source code into CIL
rather than directly to a specific instruction set. One benefit is language integration. As you have
already seen, each .NET-aware compiler produces nearly identical CIL instructions. Therefore, all languages are able to interact within a well-defined binary arena.
Furthermore, given that CIL is platform-agnostic, the .NET Framework itself is platform-agnostic, providing the same benefits Java developers have grown accustomed to (i.e., a single code base running on numerous operating systems). In fact, there is an international standard for the C# language, and a large subset of the .NET platform and implementations already exist for many non-Windows operating systems (more details at the conclusion of this chapter). In contrast to Java, however, .NET allows you to build applications using your language of choice.

Compiling CIL to Platform-Specific Instructions
Due to the fact that assemblies contain CIL instructions, rather than platform-specific instructions, CIL code must be compiled on the fly before use. The entity that compiles CIL code into meaningful CPU instructions is termed a just-in-time (JIT) compiler, which sometimes goes by the friendly name of Jitter. The .NET runtime environment leverages a JIT compiler for each CPU targeting the runtime, each optimized for the underlying platform.
For example, if you are building a .NET application that is to be deployed to a handheld
device (such as a Pocket PC), the corresponding Jitter is well equipped to run within a lowmemory environment. On the other hand, if you are deploying your assembly to a back-end
server (where memory is seldom an issue), the Jitter will be optimized to function in a highmemory environment. In this way, developers can write a single body of code that can be
efficiently JIT-compiled and executed on machines with different architectures.
Furthermore, as a given Jitter compiles CIL instructions into corresponding machine code, it
will cache the results in memory in a manner suited to the target operating system. In this way, if a call is made to a method named PrintDocument(), the CIL instructions are compiled into platformspecific instructions on the first invocation and retained in memory for later use. Therefore, the next time PrintDocument() is called, there is no need to recompile the CIL.

Saturday, April 4, 2009

An Overview of .NET Assemblies

Regardless of which .NET language you choose to program with, understand that despite the fact
that .NET binaries take the same file extension as COM servers and unmanaged Win32 binaries
(*.dll or *.exe), they have absolutely no internal similarities. For example, *.dll .NET binaries do
not export methods to facilitate communications with the COM runtime (given that .NET is not
COM). Furthermore, .NET binaries are not described using COM type libraries and are not registered
into the system registry. Perhaps most important, .NET binaries do not contain platform-specific
instructions, but rather platform-agnostic intermediate language (IL) and type metadata. Figure 1-2
shows the big picture of the story thus far.
















■Note There is one point to be made regarding the abbreviation “IL.” During the development of .NET, the official term for IL was Microsoft intermediate language (MSIL). However with the final release of .NET, the term was changed to common intermediate language (CIL). Thus, as you read the .NET literature, understand that IL, MSIL, and CIL are all describing the same exact entity. In keeping with the current terminology, I will use the abbreviation “CIL” throughout this text.

When a *.dll or *.exe has been created using a .NET-aware compiler, the resulting module is
bundled into an assembly. You will examine numerous details of .NET assemblies in Chapter 11.
However, to facilitate the discussion of the .NET runtime environment, you do need to understand some basic properties of this new file format.
As mentioned, an assembly contains CIL code, which is conceptually similar to Java bytecode
in that it is not compiled to platform-specific instructions until absolutely necessary. Typically,
“absolutely necessary” is the point at which a block of CIL instructions (such as a method implementation) is referenced for use by the .NET runtime.
In addition to CIL instructions, assemblies also contain metadata that describes in vivid detail
the characteristics of every “type” living within the binary. For example, if you have a class named SportsCar, the type metadata describes details such as SportsCar’s base class, which interfaces are implemented by SportsCar (if any), as well as a full description of each member supported by the
SportsCar type.
.NET metadata is a dramatic improvement to COM type metadata. As you may already know,
COM binaries are typically described using an associated type library (which is little more than
a binary version of Interface Definition Language [IDL] code). The problems with COM type information
are that it is not guaranteed to be present and the fact that IDL code has no way to document
the externally referenced servers that are required for the correct operation of the current COM
server. In contrast, .NET metadata is always present and is automatically generated by a given
.NET-aware compiler.
Finally, in addition to CIL and type metadata, assemblies themselves are also described using
metadata, which is officially termed a manifest. The manifest contains information about the current version of the assembly, culture information (used for localizing string and image resources), and a list of all externally referenced assemblies that are required for proper execution. You’ll examine various tools that can be used to examine an assembly’s types, metadata, and manifest information over the course of the next few chapters.

Friday, April 3, 2009

What C# Brings to the Table

Given that .NET is such a radical departure from previous technologies, Microsoft has developed
a new programming language, C# (pronounced “see sharp”), specifically for this new platform.
C# is a programming language that looks very similar (but not identical) to the syntax of Java.
However, to call C# a Java rip-off is inaccurate. Both C# and Java are based on the syntactical
constructs of C++. Just as Java is in many ways a cleaned-up version of C++, C# can be viewed as
a cleaned-up version of Java—after all, they are all in the same family of languages.
The truth of the matter is that many of C#’s syntactic constructs are modeled after various
aspects of Visual Basic 6.0 and C++. For example, like VB6, C# supports the notion of formal type
properties (as opposed to traditional getter and setter methods) and the ability to declare methods
taking varying number of arguments (via parameter arrays). Like C++, C# allows you to overload
operators, as well as to create structures, enumerations, and callback functions (via delegates).
Due to the fact that C# is a hybrid of numerous languages, the result is a product that is as
syntactically clean—if not cleaner—than Java, is about as simple as VB6, and provides just about
as much power and flexibility as C++ (without the associated ugly bits). In a nutshell, the C# language
offers the following features (many of which are shared by other .NET-aware programming
languages):
• No pointers required! C# programs typically have no need for direct pointer manipulation
(although you are free to drop down to that level if absolutely necessary).
• Automatic memory management through garbage collection. Given this, C# does not support
a delete keyword.
• Formal syntactic constructs for enumerations, structures, and class properties.
• The C++-like ability to overload operators for a custom type, without the complexity (e.g.,
making sure to “return *this to allow chaining” is not your problem).
• As of C# 2005, the ability to build generic types and generic members using a syntax very similar
to C++ templates.
• Full support for interface-based programming techniques.
• Full support for aspect-oriented programming (AOP) techniques via attributes. This brand of
development allows you to assign characteristics to types and their members to further qualify
their behavior.
Perhaps the most important point to understand about the C# language shipped with the
Microsoft .NET platform is that it can only produce code that can execute within the .NET runtime
(you could never use C# to build a native COM server or a unmanaged Win32 API application).
Officially speaking, the term used to describe the code targeting the .NET runtime is managed code.
The binary unit that contains the managed code is termed an assembly (more details on assemblies
in just a bit). Conversely, code that cannot be directly hosted by the .NET runtime is termed
unmanaged code.

Additional .NET-Aware Programming Languages
Understand that C# is not the only language targeting the .NET platform. When the .NET platform
was first revealed to the general public during the 2000 Microsoft Professional Developers Conference
(PDC), several vendors announced they were busy building .NET-aware versions of their
respective compilers. At the time of this writing, dozens of different languages have undergone
.NET enlightenment. In addition to the five languages that ship with Visual Studio 2005 (C#, J#,
Visual Basic .NET, Managed Extensions for C++, and JScript .NET), there are .NET compilers for
Smalltalk, COBOL, and Pascal (to name a few).
Although this book focuses (almost) exclusively on C#, Table 1-1 lists a number of .NET-enabled
programming languages and where to learn more about them (do note that these URLs are subject
to change).

Table 1-1. A Sampling of .NET-Aware Programming Languages
.NET Language Web Link Meaning in Life
http://www.oberon.ethz.ch/oberon.net Homepage for Active Oberon .NET.
http://www.usafa.af.mil/df/dfcs/bios/ Homepage for A# (a port of Ada to the .NET platform).
mcc_html/a_sharp.cfm
http://www.netcobol.com For those interested in COBOL .NET.
http://www.eiffel.com For those interested in Eiffel .NET.
http://www.dataman.ro/dforth For those interested in Forth .NET.
http://www.silverfrost.com/11/ftn95/ For those interested in Fortran .NET.
ftn95_fortran_95_for_windows.asp
http://www.vmx-net.com Yes, even Smalltalk .NET is available.
Please be aware that Table 1-1 is not exhaustive. Numerous websites maintain a list of .NET-aware
compilers, one of which would be http://www.dotnetpowered.com/languages.aspx (again, the exact
URL is subject to change). I encourage you to visit this page, as you are sure to find many .NET
languages worth investigating (LISP .NET, anyone?).

Life in aMultilanguage World
As developers first come to understand the language-agnostic nature of .NET, numerous questions
arise. The most prevalent of these questions would have to be, “If all .NET languages compile down
to ‘managed code,’ why do we need more than one compiler?” There are a number of ways to answer
this question. First, we programmers are a very particular lot when it comes to our choice of programming
language (myself included). Some of us prefer languages full of semicolons and curly brackets,
with as few language keywords as possible. Others enjoy a language that offers more “human-readable”
syntactic tokens (such as Visual Basic .NET). Still others may want to leverage their mainframe skills
while moving to the .NET platform (via COBOL .NET).
Now, be honest. If Microsoft were to build a single “official” .NET language that was derived
from the BASIC family of languages, can you really say all programmers would be happy with this
choice? Or, if the only “official” .NET language was based on Fortran syntax, imagine all the folks out
there who would ignore .NET altogether. Because the .NET runtime couldn't care less which language
was used to build a block of managed code, .NET programmers can stay true to their syntactic preferences,
and share the compiled assemblies among teammates, departments, and external
organizations (regardless of which .NET language others choose to use).
Another excellent byproduct of integrating various .NET languages into a single unified software
solution is the simple fact that all programming languages have their own sets of strengths and weaknesses.
For example, some programming languages offer excellent intrinsic support for advanced
mathematical processing. Others offer superior support for financial calculations, logical calculations,
interaction with mainframe computers, and so forth. When you take the strengths of a particular programming
language and then incorporate the benefits provided by the .NET platform, everybody wins.
Of course, in reality the chances are quite good that you will spend much of your time building
software using your .NET language of choice. However, once you learn the syntax of one .NET language,
it is very easy to master another. This is also quite beneficial, especially to the consultants of
the world. If your language of choice happens to be C#, but you are placed at a client site that has
committed to Visual Basic .NET, you should be able to parse the existing code body almost instantly
(honest!) while still continuing to leverage the .NET Framework. Enough said.

Thursday, April 2, 2009

The .NET Solution

So much for the brief history lesson. The bottom line is that life as aWindows programmer has been
tough. The .NET Framework is a rather radical and brute-force approach to making our lives easier.
The solution proposed by .NET is “Change everything” (sorry, you can’t blame the messenger for the
message). As you will see during the remainder of this book, the .NET Framework is a completely new
model for building systems on the Windows family of operating systems, as well as on numerous
non-Microsoft operating systems such as Mac OS X and various Unix/Linux distributions. To set the
stage, here is a quick rundown of some core features provided courtesy of .NET:
• Full interoperability with existing code: This is (of course) a good thing. Existing COM binaries
can commingle (i.e., interop) with newer .NET binaries and vice versa. Also, Platform Invocation
Services (PInvoke) allows you to call C-based libraries (including the underlying API
of the operating system) from .NET code.
• Complete and total language integration: Unlike COM, .NET supports cross-language inheritance,
cross-language exception handling, and cross-language debugging.
• A common runtime engine shared by all .NET-aware languages: One aspect of this engine is
a well-defined set of types that each .NET-aware language “understands.”
• A base class library: This library provides shelter from the complexities of raw API calls and
offers a consistent object model used by all .NET-aware languages.
• No more COM plumbing: IClassFactory, IUnknown, IDispatch, IDL code, and the evil VARIANTcompliant
data types (BSTR, SAFEARRAY, and so forth) have no place in a native .NET binary.
• A truly simplified deployment model: Under .NET, there is no need to register a binary unit
into the system registry. Furthermore, .NET allows multiple versions of the same *.dll to
exist in harmony on a single machine.
As you can most likely gather from the previous bullet points, the .NET platform has nothing to
do with COM (beyond the fact that both frameworks originated from Microsoft). In fact, the only
way .NET and COM types can interact with each other is using the interoperability layer.
■Note Coverage of the .NET interoperability layer (including PInvoke) is beyond the scope of this book. If you
require a detailed treatment of these topics, check out my book COM and .NET Interoperability (Apress, 2002).

Introducing the Building Blocks of the .NET
Platform (the CLR, CTS, and CLS)
Now that you know some of the benefits provided by .NET, let’s preview three key (and interrelated)
entities that make it all possible: the CLR, CTS, and CLS. From a programmer’s point of view, .NET
can be understood as a new runtime environment and a comprehensive base class library. The runtime
layer is properly referred to as the common language runtime, or CLR. The primary role of the
CLR is to locate, load, and manage .NET types on your behalf. The CLR also takes care of a number
of low-level details such as memory management and performing security checks.
Another building block of the .NET platform is the Common Type System, or CTS. The CTS
specification fully describes all possible data types and programming constructs supported by the
runtime, specifies how these entities can interact with each other, and details how they are represented
in the .NET metadata format (more information on metadata later in this chapter).
Understand that a given .NET-aware language might not support each and every feature defined
by the CTS. The Common Language Specification (CLS) is a related specification that defines a subset
of common types and programming constructs that all .NET programming languages can agree on.
Thus, if you build .NET types that only expose CLS-compliant features, you can rest assured that all
.NET-aware languages can consume them. Conversely, if you make use of a data type or programming
construct that is outside of the bounds of the CLS, you cannot guarantee that every .NET programming
language can interact with your .NET code library.

The Role of the Base Class Libraries
In addition to the CLR and CTS/CLS specifications, the .NET platform provides a base class library
that is available to all .NET programming languages. Not only does this base class library encapsulate
various primitives such as threads, file input/output (I/O), graphical rendering, and interaction
with various external hardware devices, but it also provides support for a number of services required
by most real-world applications.
For example, the base class libraries define types that facilitate database access, XML manipulation,
programmatic security, and the construction of web-enabled (as well as traditional desktop and
console-based) front ends. From a high level, you can visualize the relationship between the CLR,
CTS, CLS, and the base class library,

Wednesday, April 1, 2009

The Philosophy of .NET

Life As a Java/J2EE Programmer
Enter Java. The Java programming language is (almost) completely object oriented and has its syntactic
roots in C++. As many of you are aware, Java’s strengths are far greater than its support for platform
independence. Java (as a language) cleans up many unsavory syntactical aspects of C++. Java (as
a platform) provides programmers with a large number of predefined “packages” that contain various
type definitions. Using these types, Java programmers are able to build “100% Pure Java” applications
complete with database connectivity, messaging support, web-enabled front ends, and a rich user
interface.
Although Java is a very elegant language, one potential problem is that using Java typically
means that you must use Java front-to-back during the development cycle. In effect, Java offers little
hope of language integration, as this goes against the grain of Java’s primary goal (a single programming
language for every need). In reality, however, there are millions of lines of existing code out
there in the world that would ideally like to commingle with newer Java code. Sadly, Java makes this
task problematic.
Pure Java is simply not appropriate for many graphically or numerically intensive applications
(in these cases, you may find Java’s execution speed leaves something to be desired). A better approach for such programs would be to use a lower-level language (such as C++) where
appropriate. Alas, while Java does provide a limited ability to access non-Java APIs, there is little
support for true cross-language integration.

Life As a COM Programmer
The Component Object Model (COM) was Microsoft’s previous application development framework.
COM is an architecture that says in effect, “If you build your classes in accordance with the
rules of COM, you end up with a block of reusable binary code.”
The beauty of a binary COM server is that it can be accessed in a language-independent manner.
Thus, C++ programmers can build COM classes that can be used by VB6. Delphi programmers
can use COM classes built using C, and so forth. However, as you may be aware, COM’s language
independence is somewhat limited. For example, there is no way to derive a new COM class using
an existing COM class (as COM has no support for classical inheritance). Rather, you must make use
of the more cumbersome “has-a” relationship to reuse COM class types.
Another benefit of COM is its location-transparent nature. Using constructs such as application
identifiers (AppIDs), stubs, proxies, and the COM runtime environment, programmers can
avoid the need to work with raw sockets, RPC calls, and other low-level details. For example, consider
the following VB6 COM client code:
' This block of VB6 code can activate a COM class written in
' any COM-aware language, which may be located anywhere
' on the network (including your local machine).
Dim c as MyCOMClass
Set c = New MyCOMClass ' Location resolved using AppID.
c.DoSomeWork

Although COM can be considered a very successful object model, it is extremely complex under
the hood (at least until you have spent many months exploring its plumbing—especially if you
happen to be a C++ programmer). To help simplify the development of COM binaries, numerous
COM-aware frameworks have come into existence. For example, the Active Template Library (ATL)
provides another set of C++ classes, templates, and macros to ease the creation of COM types.
Many other languages also hide a good part of the COM infrastructure from view. However, language
support alone is not enough to hide the complexity of COM. Even when you choose a relatively
simply COM-aware language such as VB6, you are still forced to contend with fragile registration
entries and numerous deployment-related issues (collectively termed DLL hell).

Life As a Windows DNA Programmer
To further complicate matters, there is a little thing called the Internet. Over the last several years,
Microsoft has been adding more Internet-aware features into its family of operating systems and
products. Sadly, building a web application using COM-based Windows Distributed interNet Applications
Architecture (DNA) is also quite complex.
Some of this complexity is due to the simple fact that Windows DNA requires the use of numerous
technologies and languages (ASP, HTML, XML, JavaScript, VBScript, and COM(+), as well as
a data access API such as ADO). One problem is that many of these technologies are completely
unrelated from a syntactic point of view. For example, JavaScript has a syntax much like C, while
VBScript is a subset of VB6. The COM servers that are created to run under the COM+ runtime have
an entirely different look and feel from the ASP pages that invoke them. The result is a highly confused
mishmash of technologies.
Furthermore, and perhaps more important, each language and/or technology has its own type
system (that may look nothing like another’s type system). An “int” in JavaScript is not quite the same
as an “Integer” in VB6.