Friday, November 14, 2003
It’s Gonna be a Long Weekend
Most people in Seattle will be blissfully unaware but all over area small (and not so small) groups of people will be getting together to watch TV at 1am in the morning: yes it is the semi finals of the Rugby World Cup this weekend.
Tomorrow is the battle of the Southern Hemisphere Australia v New Zealand and on Sunday morning it is the turn of the Northern Hemisphere with England v France the games start at 1am Seattle time so it looks like I won’t be getting much sleep.
Who am I supporting? Even though the Irish were knocked out by the France I still can’t quite bring myself to support the English so I’ll be shouting for “Les Bleus”. Between Australia and New Zealand I think I have a marginal preference for New Zealand.
Working on a compiler can be difficult: you can make what you think is a minor change to fix a bug, check it in to the source tree. If you get the fix wrong and you are lucky, you’ll have your QA team telling you that you just broke a whole series of tests. If you are unlucky the bug will lurk for few weeks or months and the first you’ll hear of it is when some build-lab owner in NT is ‘phoning you up demanding to know why you broke their build and could you get it fixed already. If you are really unlucky their VP will call up your VP and next thing you know you are trying to explain why you broke the NT build to some really senior (and annoyed) people.
So to try and avoid problems like these each developer has to run a series of test suites before they can make a check-in. When I first joined the Microsoft C++ compiler team there where 3 test suites that had to be run:
· Sniff – this took 20 minutes to run and just validated that the most obvious tests like “Hello World” and scribble.exe still built.
· 2Hr BVT & 6Hr BVT – these were larger Build Validation Test suites and as the names suggest they took 2 hours and 6 hours to run respectively
Today we still have these 3 suites and while the Sniff suite still takes around 20 minutes to run both the 2Hr and 6Hr suites now finish in less than 30 minutes – not because we have reduced the number of tests but because machines are now so much faster.
Over time we came to realize that these 3 suites were not enough and so over the years we have added more suites: we now have test suites that target conformance to the C++ Standard, suites that target code that uses attributes, suites that build and execute 3rd Party Libraries like Boost, suites that test the parser we use to provide Intellisense and, most recently, suites that target managed code. Currently before any check-in is made to the compiler source tree a developer will run approximately 14 different test suites.
But even with all these suites we still found we were running into issues: a lot of these issues were of the form of a change to the compiler parser would break the linker; or a change to optimizer would break the parser; or a change to the C runtime would break everyone (Sorry Martyn J). There were also issues were a change to the IA-32 compiler would break either or both of the IA-64 and the AMD-64 compilers (and vice-versa). On top of these desktop platforms the compilers are now used to target all the chips that are used by the Windows CE team. On all platforms there were issues were a retail build would work but a debug build would fail. So it was suggested that each time a change is ready to be checked in each developer should run every other team’s suites as well as their own team’s and that they should run these suites on all platforms and for all builds (retail/debug/test).
While this may sound like a great idea in theory it is not remotely practical: the combinatorics of suites, builds and platforms is huge and also not every developer has access to all the different machines necessary to run all the tests. There was always the problem of a developer “forgetting” to run a suite before they checked in – “I know this change cannot possibly break anything on IA-64” – wrong! So it was clear that we needed a process that was fast, required little or no developer intervention, and could handle running multiple test suites on multiple platforms: welcome to Gauntlet.
“Running the Gauntlet” is a term for a form of medieval punishment in which the miscreant would have to run between 2 lines of knights who would attempt to hit him (or her) with their gauntlets. There is an image of a rather tamer version in the Pieter Brueghel painting “Young Folks at Play” (Note: Pieter Brueghel is the eldest son of the famous Flemish painter Pieter Bruegel).
What is Gauntlet? It is program that runs on a server and which serializes all check-ins: it works as follows:
When a developer is ready to check-in they open up a web-page on the Gauntlet machine and fill out some information about what tree they are checking into (Parser, Optimizer, Linker, Runtime) and what files they are changing. They then submit the check-in to Gauntlet. The Gauntlet machine will then take the diffs from the developer’s machine apply them to its own copy of the source code and then run a whole series of builds and tests on different platforms. Currently for a check-in to the parser the Gauntlet machine will build about 12 different variations of the compiler and it will then run about 35 suites from all areas and on all platforms.
“Doesn’t this take for ever?” I hear you ask. No: Gauntlet is not just one machine: it is a cluster of about 30 machines (most IA-32 but also some IA-64 and AMD-64) – once Gauntlet has built a particular flavor of the compiler it farms out the suites for that flavor to other machines: as all the testing can be done in parallel. A check-in to the parser only takes Gauntlet just over 1 hour – but if we serialized all the building and testing it would take closer to 12 hours. This means we get a maximum amount of testing in a minimum amount of time.
Having Gauntlet has really helped us to improve the quality of the whole compiler toolset: it’s not perfect (it can take a while to get your turn) but it is much better than leaving all the testing up to individual developers.
I’ll probably come back to our development process again in the future but if you have any questions/comments please feel free leave me some comments and I’ll try to address them in a future block.
One question I have gotten is why doesn’t your blog have RSS – it’s a long story. Basically www.gotdotnet.com decided to stop accepting any more new bloggers (at least temporarily) so I decided to use blogger.com (and blogspot.com) both of which are now owned by Google: unfortunately to get RSS I need to upgrade to the professional version: but at the moment they are not accepting any more upgrades L- so for now I am stuck without RSS. Sorry.
Thursday, November 13, 2003
What compiler do you use?
This is a common question I get asked at a lot of conferences: users are always interested in what tools we use internally.
The answer is complicated.
First: there is no one single version of the compiler which is used by every team at Microsoft. Each team is free to choose which ever version of the compiler best suits their needs. For a lot of teams this is a previously released version of the product: some teams use the compiler that shipped with Visual C++ .NET 2003 (AKA Visual C++ 7.1) other teams use the compiler that shipped with Visual C++ .NET (AKA Visual C++ 7.0) some team still use Visual C++ 6.0 and I have even heard reports of a team that still use Visual C++ 5.0 (Why? I have no idea).
But for some teams like the Visual Studio team and the Windows team a previously shipped product in not new enough - they needs access to features that are not yet ready to ship: so these teams need to engage in that great Microsoft tradition dogfooding.
Dogfooding is not only a great way for these teams to get early access to the features that they need but it is also a great way for the compiler team to get our latest compiler heavily tested on some real world code: running compiler test suites is one thing (and a very important part of our testing strategy) but there is no test like being able to compile and boot NT.
How does dogfooding work: every few months the Visual C++ team decides that they want to start an LKG push - LKG stands for Last Know Good: an LKG build is a complete compiler tool set that the Visual C++ team feels is good enough to be used outside of the immediate team. The LKG process involves first picking a daily build that we feel is of a high enough quality - basically a build that passes all our front-line testing. Once we have such a build we create a branch off our main development source code control depot and this branch becomes the LKG branch. This compiler is then subjected to a full test pass and any issues that are found are fixed in the LKG branch (and, of course, in the main branch) once our QA feels that the LKG has reached an appropriate level of quality we start dropping early releases to those teams that are interested (different teams pick up releases at different times: if a team is just about to ship the last thing they need is a new compiler) these teams will build their product and run their own internal tests. Any issues they find are reported back to the Visual C++ team and if necessary the LKG build is updated. Finally whenever all the teams agree that the quality of the compiler is high enough the tools set is released and the other teams can pick it up and use it to do their daily builds.
But the story does not end there: no matter how much testing we do issues may slip through: so it is possible for a team like NT to find an issue several months after we have released an LKG: in these cases we have to track down the issue, fix it, patch the LKG and release the updated toolset.
As you would expect the most extreme dogfooders of the compiler tools are the compiler team themselves: we use tools that are only days old: yes this can have it drawbacks as the tools can sometimes become unstable but these problems are minor when compared to the benefits we get from a lot of people using the latest version of the tools.
In my next blog I'll talk some more about our internal development processes and how we try to guarantee that every build of the compiler is a high quality build.
A little bit about myself. I've been working on compilers for longer than I care to remember (Pascal, Fortran, Modula-2, C and most recently C++). For the last 10 years I've been a developer on the Visual C++ compiler team: currently I am also the Microsoft representative on the ISO C++ committee WG21/X3J16.
The main focus of my current activities is C++ for the .NET platform.
I am hoping that I can use this blog to provide insights into the decision making and development process behind Visual C++: basically what we're doing and why we're doing it.