Learning on the edge | MIT news

Microcontrollers, miniature computers that can run simple commands, are the basis for billions of connected devices, from Internet of Things (IoT) devices to sensors in cars. But cheap, low-power microcontrollers have very limited memory and no operating system, which makes it difficult to train AI models on “high-end hardware” that operates independently of central computing resources.

Training a machine learning model on a smart edge device allows it to adapt to new data and make better predictions. For example, training a model on a smart keyboard can enable the keyboard to continuously learn from the user’s typing. However, the training process requires so much memory that it is usually performed using powerful computers in the data center, before the model is deployed to the machine. This is more expensive and raises privacy issues as user data must be sent to a central server.

To address this problem, researchers at MIT and the MIT-IBM Watson AI Lab have developed a new technology that enables machine training using less than a quarter of a megabyte of memory. Other training solutions designed for connected devices can use more than 500 MB of memory, which greatly exceeds the 256 KB capacity of most microcontrollers (there are 1024 KB in one megabyte).

Smart algorithms and a framework developed by the researchers reduce the amount of computation required to train a model, making the process faster and more memory efficient. Their method can be used to train a machine learning model on a microcontroller in a matter of minutes.

This technology also maintains privacy by keeping data on the device, which can be particularly useful when the data is sensitive, such as in medical applications. It can also enable the customization of a form based on the needs of the users. Moreover, the framework maintains or improves the accuracy of the model when compared to other training methods.

“Our study of IoT devices allows not only to perform inference but also to continuously update AI models for newly collected data, paving the way for lifelong learning on the device,” says Song Han, assistant professor in the Department of Electrical Engineering and Computer Science (EECS), a Member of the Massachusetts Institute of Technology (MIT): -IBM Watson AI Lab, and senior author the paper describing this innovation.

Joining Han in the paper are co-authors and EECS doctoral students Ji Lin and Ligeng Zhu, as well as MIT postdoctoral researchers Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a member of the lead research team at MIT-IBM Watson. Artificial Intelligence Lab. The research will be presented at the Neural Information Processing Systems conference.

Han and his team previously addressed Memory and computational bottlenecks which exist when trying to run machine learning models on small high-end machines, as part of TinyML Initiative.

light weight training

A common type of machine learning model is known as a neural network. These models are loosely based on the human brain, and contain layers of interconnected ganglia or neurons that process data to complete a task, such as recognizing people in images. The model must be trained first, which involves showing it millions of examples so that it can learn the task. As it learns, the model increases or decreases the strength of the connections between neurons, which are known as weights.

The model may undergo hundreds of updates as it learns, and intermediate activations must be stored during each round. In a neural network, the activation is the intermediate results of the middle layer. Han explains that there may be millions of weights and activations, training a model requires much more memory than running a previously trained model.

Han and his collaborators used two algorithmic solutions to make the training process more efficient and less memory intensive. The first, known as sparse refresh, uses an algorithm that determines the most important weights that need to be updated in each training round. The algorithm starts freezing the weights one by one until it sees the accuracy drop to a certain limit, then stops. The remaining weights are updated, while the activations corresponding to the frozen weights do not need to be stored in memory.

“Updating the entire model is very expensive because there are a lot of activations, so people tend to only update the last layer, but as you can imagine, this hurts accuracy. For our method, we update these important weights selectively and make sure that accuracy is consistently maintained. Complete,” says Han.

Their second solution involves quantitative training and simplification of weights, which are usually 32 bits. The algorithm rotates the weights so they are only eight bits, through a process known as quantization, which cuts the amount of memory for both training and inference. Inference is the process of applying a model to a data set and creating a prediction. The algorithm then applies a technique called Quantitative Perceptual Scaling (QAS), which acts as a multiplier to adjust the ratio between weight and gradation, to avoid any drop in accuracy that might come from quantitative training.

Researchers have developed a system, called the Micro Training Engine, that can run these algorithmic innovations on a simple microcontroller that lacks an operating system. This system changes the order of steps in the training process so that more work is completed in the assembly phase, before the model is deployed to the edge device.

“We push a lot of arithmetic, such as auto differentiation and graph optimization, to collect time. We also aggressively shrink redundant operators to support sporadic updates. Once uptime, we have much less workload to do on the device,” explains Hahn.

successful acceleration

Optimizing it only required 157 KB of memory to train a machine learning model on a microcontroller, while other technologies designed for lightweight training would still need between 300 and 600 MB.

They tested their framework by training a computer vision model to detect people in images. After only 10 minutes of training, I learned to successfully complete the task. Their method was able to train a model more than 20 times faster than other methods.

Now that they have demonstrated the success of these techniques for computer vision models, the researchers want to apply them to language models and different types of data, such as time series data. At the same time, they want to use what they’ve learned to scale down larger models without sacrificing accuracy, which could help reduce the carbon footprint of training large-scale machine learning models.

“Adapting/training an AI model on a device, especially on embedded controllers, is an open challenge. This research from MIT not only successfully demonstrated the capabilities, but also opened up new possibilities for device customization that maintains privacy in real time,” he says. Nilesh Jain, the chief engineer at Intel who was not involved in this work. “The innovations in the publication have broader applicability and will launch co-design research for the new systems algorithm.”

“On-device learning is the next major advance we are working on for the connected intelligent edge. Professor Song Han’s group has shown significant progress in demonstrating the effectiveness of cutting-edge hardware for training,” adds Jilly Hu, Vice President and Head of Artificial Intelligence Research at Qualcomm. “Qualcomm awarded his team the Innovation Fellowship for further innovation and advancement in this field.”

This work is funded by the National Science Foundation, MIT-IBM Watson AI Lab, MIT AI Hardware Program, Amazon, Intel, Qualcomm, Ford Motor Company, and Google.

Leave a Comment