Making the same mistake twice

We all make mistakes – after all, we’re only human.  The key thing is to learn from them.

I’ve made a few in my time.  Like when I improved my ZX81’s power supply by adding a linear regulator to drop the voltage so that the computer ran cooler.  I also decided to house this new power supply in a metal box which of course had to be earthed.  The other problem with the ZX81 was that the programs were loaded by tape and my machine was unreliable.  So I came up with the idea of using the speaker output from my hi-fi to improve the drive into the phones socket.  It looked good for about 40 seconds at which point one of the output transistors, which was now shorted to the earth via the new power supply box, went up in a puff of smoke.  Needless to say, I knew what had happened immediately and cursed my stupidity.

Although it was slightly less dramatic, years later I came across an odd problem with a customer board.  I said in my last post that it’s ideal to have a proper debugger where you can set breakpoints and so on.  However, we also used a debug application that connected via the serial port.  A bit like HyperTerminal but with a fancy front-end that gave you access to all of the chips registers and would allow you to navigate various the menu states without having to provide the key presses or infra-red buttons.

On this particular customer’s board, the serial debug interface was not working and according to the software engineers had never worked.  The odd thing is that the serial port appeared to be fine because the customer could use it to flash the code down to the board.  It wasn’t a big issue because the engineers to read the registers by looking at a dump of the memory map but it was definitely more convenient to have the register names rather than the memory addresses, so I wanted to find out why it wasn’t working.

A look at the customer’s schematic revealed the issue.  The hardware designer, instead of powering the serial port from the supply, had powered it from a GPIO pin of the microcontroller.  Presumably he wanted to use the micro to disable or enable the serial interface depending on the state of the GPIO pin.  The result was that there was just enough current to be able to flash code down to the board but not enough to handle the different baud rate for debugging.

I cut the track and put a wire straight to the supply and was then able to both flash the code and run the debug interface.

The hardware engineer moved companies, to their largest competitor, where he designed a board with another of our chips.  When I came to debug that board, yes you’ve guessed it, I could flash code but the serial debug interface did not work.  He had made the same mistake twice!

Needless to say, when he asked for a recommendation on LinkedIn, I ignored the request.

…but how do you debug the code?

Continuing the theme of board bring-ups, I did one a few years ago with a high-profile customer, probably the foremost manufacturer of LCD TVs in Europe.  We were the back-end scaling and LCD driving part of the design and Micronas were the front-end video decoder and deinterlacer.  They had sent over 3 FAEs to tune their video decoder whereas it was just me from our side.

However, it wasn’t just the FAEs that were unbalanced.  The customer had 3 hardware engineers and just 1 software engineer.  I had done my homework and got the latest software build for our part so I was pretty confident that we’d get the software running quickly and start knocking off their software bugs at a good rate.

I would normally insist on doing a schematic and layout check before the customer signed off the PCB for manufacture but I figured that these guys knew what they were doing, so I didn’t in this case.  As it turned out, that was a mistake.

I had unpacked my evaluation board and ROM emulator (this was before the days of JTAG debugging) and was raring to go when I noticed something odd about their board.  I go to plug in my emulator and find there is no debug socket to plug it in to.  When I asked where it was, they told me they didn’t put it on because they didn’t think they would need it.

So I asked the software engineer, “How do you debug the code without an emulator?”

“Oh,” he said, “I just flash it down and use printf debugging.”

Well, anyone who has ever debugged code knows that printf debugging is only really workable for small amounts of code or for where you really have no other choice.  To debug the amount of code in your average TV you really need to set breakpoints in the code, single-step through lines of code and be able to inspect variables.

Bearing in mind that this was the first prototype board and there are usually going to be hardware changes before the final hardware, I could not believe they had not put a debug connector on the board.  By all means, don’t populate it on production board or even have a break-off board.  It’s not as if there was any space constraint!

Well, I said to the 3 hardware engineers who were standing around twiddling their thumbs, “Take this board and de-solder the FLASH memory.  Then wire the address and data pins to this connector.”  After about an hour, they came back with the board.  I plugged in the emulator and the code ran nicely.  I could set hardware breakpoints and debug code quickly.

The story has a happy ending.  Not only did the board work well but it went on to be this company’s best selling LCD TV chassis ever!

Software or Hardware problem?

There haven’t been many interesting enquiries this week, so I thought I’d tell you about an interesting issue I had a few years ago bringing up a customer’s first prototype board.  Whenever a customer starts an embedded design, I like to schedule some time with them as soon as the first boards come back to make sure the hardware comes up alive and ready for the software development to start.

The day before the scheduled bring-up, I had a call from the software engineer to say that they had done some initial hardware tests on the first board and it seemed fine.  However, he had tried to run the software he had been developing and the emulator could not connect to the board with the debugger reporting an unhelpful error message.

So, first thing to do was to get a scope and look at the reset, address and data signals.  The scope was of a certain age; you could tell that by the monochrome orange display; but it had a facility to print the trace on the screen, so we could compare the signals from the evaluation board and the customer’s board.

On the evaluation board, we could see a nice reset followed by a burst of addresses and data as the code loaded and ran.  With the customer’s board, we got a reset, the start of the addresses as it jumped to the interrupt vector but then nothing.  Not many other clues except one of the signals appeared to be about 0.6V below where it should be.

So we dug out the schematic and chased it back to a transistor.  Although this was a low power micro with some extremely low power (in the nanoamps) power down modes the hardware engineer had put in a transistor to switch off the battery power if needed.  The hardware engineer disappeared for a few minutes to remove it from the board and lo and behold the emulator connected and the code ran happily.

The moral of the story is that although you may be getting error messages from the software debugger (which may or may not be helpful and informative) it could be a hardware issue preventing your code from running.  So get a fresh pair of eyes to check your schematic and layout to catch any omissions or errors before you commit to a board.