A less wasteful way to train large language models, such as the GPT series, finishes in the same amount of time for up to 30% less energy, according to a new study.
The approach could save enough energy to power 1.1 million US homes in 2026, based on Wells Fargo’s projections of AI power demand. It could also take a bite out of the International Monetary Fund’s prediction that data centers could account for 1.2% of the world’s carbon emissions by 2027—and the water demands that come with that energy use.
Some experts say that these costs could be outweighed by environmental benefits. They argue that AI could be a “game changer” for fighting climate change by identifying ways to optimize supply chains and the grid, manage our energy needs, and improve research on climate change.
Still, that doesn’t excuse squandering energy, and some of the power used to train AI has zero impact on training time and model accuracy.
“Why spend something when there’s no point?” says Mosharaf Chowdhury, a University of Michigan associate professor of computer science and engineering and the corresponding author of the study presented at the 30th Symposium on Operating Systems Principles.
“We can’t keep building bigger and bigger data centers because we won’t have the power to run them. If we can reduce the energy consumed by AI, we can reduce AI’s carbon footprint and cooling requirements and allow for more computation to fit within our current energy constraints.”
The energy waste is created when AI training is unequally divided between GPUs, which are computer processors specialized for large data and graphics applications. Although it opens the door for waste, splitting the work is necessary for processing huge datasets.
“AI models today are so large, they cannot fit inside a single computer processor,” says Jae-Won Chung, a doctoral student in computer science and engineering and the first author of the study.
“They need to be divided into tens of thousands of processors to be trained, but dividing the models in perfectly equal sizes across all processors is practically impossible.”
The training jobs are so difficult to evenly split up because some tasks need to be grouped together on the same processor—like how each installment of a book series will be grouped together in an organized shelf. Depending on how the tasks are grouped, some processors might get stuck with the AI-training equivalent of the Encyclopedia Britannica while others get assigned a fantasy trilogy.
Because current training methods run each processor at top speed, processors with a lighter load will finish their calculations before other processors. This doesn’t speed up training, which isn’t complete until every processor finishes its job—but it is wasteful because faster calculations require more energy. In addition, problems such as faulty hardware or network delays create energy waste by slowing down a single processor’s computing speed.
To save energy, the researchers developed a software tool, called Perseus, that identifies a critical path, or a series of subtasks that will take the longest time to complete. Then, Perseus slows down processors that aren’t on the critical path so that they all finish their jobs around the same time—eliminating unnecessary power use.
“Reducing the power cost of AI can have important implications for equitable AI access,” Chowdhury says. “If a country doesn’t have enough power to run a big model, they might need to use services from far away, or be stuck running smaller, less accurate models. This gap could further perpetuate disparity between different communities.”
The team tested Perseus by training GPT-3, three other large language models and one computer vision model.
Perseus is an open-sourced tool available as part of Zeus, a tool for measuring and optimizing AI energy consumption.
Funding for the research came from the National Science Foundation, Dutch Research Council (NWO) Talent Programme, VMware, Mozilla Foundation, Salesforce, and Kwanjeong Educational Foundation. Chameleon Cloud and CloudLab supported the research by providing computational resources.
Source: University of Michigan