Are data & AI projects environmentally compatible?

Written by:

Charlotte Ledoux

Estée Cogneau

Reading duration:3 min


It is highly difficult to evaluate the footprint carbon of any project because of all the factors included. The same applies to data and AI projects, which have long been underestimated and yet have a strong impact on our environment.

Two elements seem to be the main contributors behind the green cost concerning the ecologists with an exponential increase: the machine learning models and the data centers.

Algorithms and their massive calculations

To understand this system, we must first look at the process of machine learning models. This method is nowadays present in our daily life. Machine learning gets its insights from user experiences. Thus, by analyzing the information and data collected during the various uses, the machine will be able to predict and deduce the needs for which it is trained, such as a music playlist matching our preferences or photo folders of our family members. So, the more the model is fed the better its performance will be. Naturally, all these cumulative information, calculations and learning require energy and produce CO2. For example, the complex calculations of an algorithm on natural language, is equivalent to 355 years of calculation needed on a single processor. This means in terms of CO2 emissions, the distance of 700.000 km of a new car, approximately 114 tons of C02.

Even more data storage needs

Moreover, this amount of data and the entire model composed of thousands of calculations must also be permanently stored in spaces provided especially for it, which consume energy permanently allowing these models to stay independent anywhere and anytime. According to a U.S. study by Energy Innovation, the largest data centers require an energy capacity of more than 100 megawatts equivalent to providing energy to 80,000 households.

The implementation of AI and data projects is indeed an impacting process for the environment and the multiplication of these models only increases energy consumption. However, shutting down these processes does not seem in line with the modern ultra-connected world.

So how can we limit the carbon footprint of our data and AI projects without impacting the competitiveness of our company?

Mutualization of calculations

Whether due to siloed subsidiaries or to a process involving several players, as in every stage of the supply chain or under the advertising industry, almost identical models keep cropping up. Why? Because the goals of each actor are the same: respond to the demand, improve the customer experience or get a 360° view of the customer. Companies tend to develop several models to compensate their lack of information. However, in addition to being limited in terms of efficiency, results to increase overconsumption, even when sharing common objectives with their working partners. Indeed, the model being realized several times, the calculations executed, and the stored data are multiplied.

To limit this situation, one solution calls for new forms of cooperation between these actors desiring to run an AI project. The aim is to combine the data of the different entities, not only to create a more accurate model but also to reduce the carbon footprint of each partner.

Choosing a responsible housing

As we mentioned earlier, storing all information and overpowering AI models has high environmental effects. For this reason, some data centers have endeavored to limit their carbon footprint, such as Scaleway, a partner of Vallai, committed to implementing a sustainable and responsible process to operate their data centers, such as recycling their physical equipment and selecting their water and electricity suppliers carefully. Scaleway wants to be fully transparent with its customers on their environmental footprint by providing access to their Power Usage Effectiveness (PUE) in real time, for example.

In a quest for sustainable ecosystems

Finally, the life cycle of a product strongly determines its carbon impact, amortized over time. Taking car for short trips is more damaging for the environment than long trips. The same goes for algorithms, models and data storage, by reusing them and focus on managing existing assets rather than multiplying them, the environmental costs are depreciated over time. Data ecosystems wherein resources and knowledge circulate in a secure and regulated manner do not only have a positive impact on the environment but also on innovation. The economic purpose is to create as much value as possible for as long as possible, while consuming the least amount of energy.

When building data/IA projects, it is necessary to evaluate the environmental cost of the operation and to consider the possibility of reducing the CO2 emissions involved in these processes. A very first step would be to challenge the fact that AI is even necessary for the problem to solve. Many times there exists simpler ways that do not require computing power.

Finally, it is one of our objectives as a vendor to provide indicators to calculate the CO2 emissions of each data project managed through our platform to increase user awareness and responsibility. Yes, our technical roadmap is ambitious!

Sources :