There are quite a few articles mentioning that the era of 8 bit MCUs is at its end, and that's a fair point that prices became quite similar, one can obtain low-end ARM microcontrollers at a price matching that of 8 bit micros. However working with 32 bit ARMs for a while, I don't think the situation is this simple. It could be, but... But there are these "but"s. So here I summarize some experiences and thoughts over the matter.
The 32 bit experience
I was actually using a 32 bit micro since long, the formerly Atmel AVR32. I found these quite comfortable to work with (probably especially in retrospect) even though I encountered a nasty silicon error at some point which I had to figure out all myself. The MCU is complex, but not something impossible to understand by its technological reference manual, I learned to program it rather quickly at bare metal level.
Now why bare metal is significant? Compared to using vendor provided libraries, like it typically is with ARM and apparently other 32 bit micros.
Having a library is nice as long as it is sufficient to understand the library itself, and it has equivalent adequate documentation compared to the technical reference manual of the MCU itself. What I found in the practice is that neither is true, at least not for the ST ARMs I am working with recently.
The library provided for these ARMs is not very well documented, and as far as I found, as soon as you really needed the performance and real-time capabilities which makes the faster 32bit MCU better than the 8 bitter, you need to start digging down, finally ending up wrestling with both the complexity of the library and the MCU itself.
As an example, on an STM32F401 running at 84MHz (the specified maximum for this MCU), setting up an SPI transaction by DMA using the ST library takes about 7 microseconds, about 500 CPU clocks. Moreover, this transaction fires off a couple of other interrupts as realized by the library even though we didn't need those, and got in the way. In the end, to meet our real-time goals, we had to do this at bare-metal level, figuring out it all mostly ourselves partially by dissecting the library code as most of the resources obtainable from the Internet describe these tasks using the ST library.
With MCUs which have no such vendor provided libraries, and have some usage, at least you get relevant information for using it efficiently by researching. The point is that essentially you get much less than "advertised" when deciding on such a 32bit MCU, and it can become a nasty struggle to get somewhere near the MCU's actual capabilities.
A rarer problem, but not one I didn't encounter already is when you happen to stumble upon some silicon defect (in our case it was most likely an MCU very subtly damaged during handling). With these significantly more complex MCUs under the layer of the vendor provided library, it can become nigh impossible to dig down to the root of the problem, and you only get random, inexplicable weird behaviour which until you got there, will most likely be attributed to your poor coding skills. Definitely not good for a career.
Nonvolatile storage and prices
There is one thing I never saw in a 32 bitter, but is a staple component of just about any 8 or the occasional 16 bitter: Useful nonvolatile storage for storing parameter data. Let's see what we have in this regard:
- ST ARMs: EEPROM emulation in MCU Flash. 10K write cycles, in the case of STM32F401, 4x 16K and one 64K erase block. That's all what you have.
- Atmel (Microchip) 8bit AVRs: All of them have EEPROM, typically about 1/32th size of the MCU Flash, 100K write cycles, the erase page size is a few bytes.
- Microchip PICs: Like AVRs, they have EEPROM, similar to the 8bit AVRs.
- Texas Instruments MSP430s: Now these are awesome in this regard, the FR series contain FRAM instead of Flash, providing such an offer in a single chip which simply have no equivalent.
Why the nonvolatile storage is important? There are very few such designs which can be completely stateless, that there is no need to record parameters which are meant to be stored across power cycles. Moreover it can also commonly be a requirement to be robust against losing such stored data, or at least some of the data items.
When using the aforementioned STM32F401, if you also need a bootloader, and have the ST library to work with making it outright impossible to get a USB capable bootloader in 16K, you are immediately pretty much set on the only way you can structure your internal Flash. You have to set aside 32K for the bootloader, 2 x 16K pages for EEPROM emulation so you can realize an algorithm which can maintain a piece of data even while erasing one of those pages, and you are left with 64K for the Application, while having only a very constrained "EEPROM".
So in the end, you may obtain the micro cheap, but you will end up adding an EEPROM or FRAM chip to put alongside it, and it will take some pins and an SPI or TWI peripheral, too. If you didn't need the performance of the ARM (if you can manage to actually exploit it), you could possibly get your solution cheaper, and definitely smaller in PCB area with one of those 8 / 16 bitters.
By my experience, the 8 / 16 bitters are simply better in this regard. You can just count on them, they will do what they are meant to be doing, and when you have a problem, you have a very good chance of working it out in a pretty finite timeframe.
Part of this is that they are bare-metal, without a definite library provided by the vendor, so people using them have experience with how the micro actually works, and they share it. There is useful information on the Internet when you happen to get stuck with something weird, and this is very important!
With the 32 bitter, if you get really stuck, then you are on your own! It is a risk when planning schedules! The vendor won't help you, even if the company you are working for bought the micro by the hundred thousands yearly, that may still be too small for them to really care!
If you don't need the performance only the 32 bitter can offer, then it might be better not tangle up in the complexity of those micros. You may be lucky and just get the design rolled out in schedule fine, but if you happened to stumble upon a significant problem, then things could really get bleak.
Of course when you need raw computational power, you need the 32 bitter, as only those are offered in the higher clock frequency ranges. To get there, all sorts of technologies are necessary which the 8 / 16 bitters don't have. The latter are simpler, dependable, but that's the trade-off.
For clock to clock performance, I found about the followings holding true:
- ATmegas have about the best 8bit computational throughput for a clock cycle, up to the ATmega1280 and related. The ATmega2560 is slower in calls and returns as they need a 3rd byte for address. The AVR architecture is probably the best 8bit architecture for a high level language.
- ATxmegas are worse clock-to-clock especially due to taking 3 clocks for a load with displacement (which is 2 in an ATmega), which is very common in code generated by a high level language compiler (accessing members of a structure).
- Regarding PICs I have significant experience with PIC18 only, there 4 clock pulses were taken for one instruction. Moreover, the instruction set isn't very well suited for high level languages, so they aren't quite fast.
- The MSP430 is neither very fast in this regard, memory accesses taking several clock cycles, however its architecture fits well with high level languages.
- The Cortex-M0+ core is quite limited, its throughput is roughly similar to that of an ATmega, but with less registers to work with. The latter can outperform it as long as there is no wide arithmetic involved. The Cortex-M0 is a bit worse due to the 3 stage pipeline taking its toll with conditionals.
- The Cortex-M3 and up have the full Thumb-2 instruction set, which adds a large set of useful instructions to vastly improve throughput (as long as the 32 bits wide instructions don't starve the fetch unit). They are clearly the best in clock to clock performance.
I don't have experience with ATtinys yet, particularly those recently released by Microchip. They contain some quite nice features, such as a flat 16 bit address space including ROM and RAM, which is important for high level compilers (people apparently just don't like dealing with pointers to different types of memory).
By possible maximal clock frequency, of the 8 bitters, you can likely get the most computations out of an ATxmega (32MHz), the ATmega following behind (20MHz), but not as much as the clocks alone would imply. The MSP430 even in 16 bit tasks at its 24MHz maximum likely would barely keep up with the ATmega. Of course you can beat all these even by hurling a Cortex-M0 at the task just by sheer clock frequency (50MHz or more commonly), but it could be interesting to know in case you wanted to do some maths (simple filtering, PID control), but rather staying on the 8 / 16 bit territory.
Thoughts on the role of 8 / 16 bitters
While it is true that 32 bits MCUs came down to similar prices to the 8 / 16 bitters, I think the latter will very well survive for now, I would say pretty much rather for those characteristics mentioned above than anything else.
The fact that the MCU is 32 bits wouldn't imply complexity by definition. If a manufacturer took on it and designed a Cortex-M0+ or (at last!) a RISC-V based micro with a similar intuitive peripheral set like contemporary 8 / 16 bitters have, with a sensible solution for nonvolatile storage, maybe capable to operate at 5 volts, and it could go down to similar prices, that could very well mark the end of the 8 / 16 bit era in MCU land. The RISC-V is actually designed to be very economic in silicon area, I wouldn't be surprised if it would just work well as a tiny and cheap MCU.
However until that happens, I think there are good reasons to stick with the 8 / 16 bit micros in applications which don't need the performance of the complex 32 bitter. They are just simpler and more dependable.