Tuesday, March 27, 2007

The joy of performance bugs

I just realized it has been over a week since my last post. It has been a very busy week. I was able to get a lot of the old code ported (well more like re-written using the original code as a rough guide). It was able to navigate the maze and do decent obstacle avoidance. This all still needs a bit of work but it usable for now. I want to get all the critical bits working at an acceptable level before I work on perfecting all of the code.

I then built my flame sensors which took a bit of time but no real surprises. I took some pics of the sensors but my wife borrowed the camera and I have not gotten around to downloading the photos off of it.

The "fun" part started when I added code to talk to the flame sensors. The sensors are connected directly to a couple of AVR ATMEGA32 microcontrollers (the ones I designed the boards for which I posted about a few weeks back). Then the ARM (which is my main processors running Linux) talks to the AVR using the i2c protocol. When I added this code I ran into some serious latency issues which slowed things down to the point where the robot would constantly hit walls and not be able to navigate at all. I had done some calculations based on the speed of the i2c bus and had figured that it would have been able to read the sensor data many times over with no problems. In reality this obviously wasn't the case.

I was able to get the "oprofile" tool cross compiled for the ARM so I could gather some data to see where it was spending the most time. The data showed that over 20% of the time was being spent inside the i2c driver. Keep in mind that this ARM board does not actually have an i2c device so I am using the i2c "bit bang" driver which implements the driver in software using a couple of general purpose i/o pins. Still it should not have been eating up as much time as it did.

After a _lot_ of debugging and experimenting I came to the realization that the real bottleneck was the AVR was holding the clock line low (which is used in the protocol as a way to let the master device know that the slave device is not yet ready). I also realized that the Linux bit-bang i2c driver doesn't block in this situation (at least not when running with kernel preemption disabled as is the default for ARM). So, the ARM was wasting time spinning while waiting for the clock line to get released.

So I got a chance to learn some more Linux kernel internals! This of course was one of my goals for this project. I was able to modify the code so that it would block if the clock line was being held low and I also added an interrupt handler so that it could wake up when the line was then released.

This didn't _quite_ fix it. My code is made up of multiple threads, each of which implements a different behavior of the robot. Since the thread that was looking at the flame sensors was constantly getting blocked on interrupts the Linux scheduler essentially increased it's priority since it is then an "interactive" thread. This is the same functionality that allows your text editor to remain responsive even when someone else is running another process that is compute bound (i.e. a database). This part I knew how to fix using some pthreads scheduler magic. In the process of fixing this I implemented some other thread related optimizations I had been considering.

I now have the robot being able to navigate the maze with no critical issues (it bumps walls on occasion but not excessively) and is able to find and blow out the candle. None of this is perfect yet as I mentioned but it does work.

My next goal is to get the audio start working. For the "expert" division of the contest you are required to start the robot using a tone that is supposed to simulate a smoke detector going off. I have a few schematics for circuits to do this and I have all the parts but this is one of the analog electronics things I am not all that comfortable with. Hopefully I can get this working without any more big delays. According to the schedule I put together before I started I am 2 weeks behind. Obviously I won't be able to do everything I wanted to do. My schedule did have a full 2 weeks at the end for nothing but debug time so it sounds like that is what I will loose. I guess I just can't allow myself to have any bugs in my code ;)

2 comments:

Anonymous said...

So ... while you've been goofing off the past couple of months I've lost track of when you're back -- mind emailing me?

Rockhopper looks great. You'll have to bring it in to show it off!

P.

Unknown said...

Doug -- looking good. Sounds like the threads and interrupts gave you a good education... did you learn anything about dealing with pesky people :-) Jim