Data exploration of the Climate TRACE dataset#
In this chapter, we look at the dataset and we answer a few questions:
which countries are the biggest emitters?
what sectors are the most responsible for emissions?
In a second part, we compare two different views on emissions:
the source view, which looks from a bottom up view at all the sources (factories, mines, farms, …)
the country view, which is a top down approach, and derives many values from aggregate economic activity, for example how many tons of coal were burned to produced annually. This is the official reporting method followed by countries to report their emissions to the United Nations as part of the Paris agreements.
The data has already been prepared from the original Climate TRACE dataset. If you want to understand the preprocessing, read the chapter Ingestion.
%load_ext autoreload
%autoreload 2
import logging
logging.basicConfig(level=logging.INFO)
We import all the libraries that we will use in this notebook:
the Polars library, a very fast package with a clear interface.
the Plotly Express visualization library
the
ctrace
package (included in this repository), that contains tools to read and understand the Climate TRACE data.
import polars as pl
import plotly.io
plotly.io.templates.default = "plotly_white"
import plotly.express as px
from ctrace.constants import * # We import many useful constants
import ctrace as ct
Country emissions#
The country emissions are available through the read_country_emissions()
function. This function will download by default the latest set of reports (currently the V3, publised in November 2024). If the data has already been downloaded, it will use this copy.
The format being returned is a Polars DataFrame
. This format will be very familiar to people used to working with Pandas, Spark or R dataframes. The GAS_LIST
indicates that we want to load all the gases available.
cedf = ct.read_country_emissions(GAS_LIST)
cedf.head(3)
iso3_country | start_time | end_time | gas | sector | subsector | emissions_quantity | emissions_quantity_units | temporal_granularity | created_date | modified_date |
---|---|---|---|---|---|---|---|---|---|---|
enum | datetime[ms, UTC] | datetime[ms, UTC] | enum | enum | enum | f64 | cat | enum | datetime[ms, UTC] | datetime[ms, UTC] |
"ABW" | 2015-01-01 00:00:00 UTC | 2015-12-31 00:00:00 UTC | "co2" | "fossil-fuel-operations" | "other-fossil-fuel-operations" | 0.0 | null | "annual" | null | null |
"ABW" | 2015-01-01 00:00:00 UTC | 2015-12-31 00:00:00 UTC | "co2" | "mineral-extraction" | "bauxite-mining" | 0.0 | null | "annual" | null | null |
"ABW" | 2015-01-01 00:00:00 UTC | 2015-12-31 00:00:00 UTC | "co2" | "transportation" | "domestic-shipping" | 90613.375994 | null | "annual" | null | null |
Understanding the data format#
These rows are a mouthful to digest. Here is how the emissions data is structured.
This data is highly structured, which is reflected in the schema itself. This schema has a lot of enumerations (iso3_country
, gas
, …), where only a few values are expected. For example, all the country names are represented by their official ISO 3166 3-letter country codes. Internally, Polars can assign small integers to represent them all, which consumes less memory and speeds up manipulations. This has a further advantage: since we know the values to expect, we can give them names in the code such as CH4
, CO2
, …. We can use all the software tools to find and update references to various gas as we work with it.
TODO
move all this part in its own notebook, this is about technical details
cedf.schema
Schema([('iso3_country',
Enum(categories=['ABW', 'AFG', 'AGO', 'AIA', 'ALA', 'ALB', 'AND', 'ARE', 'ARG', 'ARM', 'ASM', 'ATA', 'ATF', 'ATG', 'AUS', 'AUT', 'AZE', 'BDI', 'BEL', 'BEN', 'BES', 'BFA', 'BGD', 'BGR', 'BHR', 'BHS', 'BIH', 'BLM', 'BLR', 'BLZ', 'BMU', 'BOL', 'BRA', 'BRB', 'BRN', 'BTN', 'BVT', 'BWA', 'CAF', 'CAN', 'CCK', 'CHE', 'CHL', 'CHN', 'CIV', 'CMR', 'COD', 'COG', 'COK', 'COL', 'COM', 'CPV', 'CRI', 'CUB', 'CUW', 'CXR', 'CYM', 'CYP', 'CZE', 'DEU', 'DJI', 'DMA', 'DNK', 'DOM', 'DZA', 'ECU', 'EGY', 'ERI', 'ESH', 'ESP', 'EST', 'ETH', 'FIN', 'FJI', 'FLK', 'FRA', 'FRO', 'FSM', 'GAB', 'GBR', 'GEO', 'GGY', 'GHA', 'GIB', 'GIN', 'GLP', 'GMB', 'GNB', 'GNQ', 'GRC', 'GRD', 'GRL', 'GTM', 'GUF', 'GUM', 'GUY', 'HKG', 'HMD', 'HND', 'HRV', 'HTI', 'HUN', 'IDN', 'IMN', 'IND', 'IOT', 'IRL', 'IRN', 'IRQ', 'ISL', 'ISR', 'ITA', 'JAM', 'JEY', 'JOR', 'JPN', 'KAZ', 'KEN', 'KGZ', 'KHM', 'KIR', 'KNA', 'KOR', 'KWT', 'LAO', 'LBN', 'LBR', 'LBY', 'LCA', 'LIE', 'LKA', 'LSO', 'LTU', 'LUX', 'LVA', 'MAC', 'MAF', 'MAR', 'MCO', 'MDA', 'MDG', 'MDV', 'MEX', 'MHL', 'MKD', 'MLI', 'MLT', 'MMR', 'MNE', 'MNG', 'MNP', 'MOZ', 'MRT', 'MSR', 'MTQ', 'MUS', 'MWI', 'MYS', 'MYT', 'NAM', 'NCL', 'NER', 'NFK', 'NGA', 'NIC', 'NIU', 'NLD', 'NOR', 'NPL', 'NRU', 'NZL', 'OMN', 'PAK', 'PAN', 'PCN', 'PER', 'PHL', 'PLW', 'PNG', 'POL', 'PRI', 'PRK', 'PRT', 'PRY', 'PSE', 'PYF', 'QAT', 'REU', 'ROU', 'RUS', 'RWA', 'SAU', 'SDN', 'SEN', 'SGP', 'SGS', 'SHN', 'SJM', 'SLB', 'SLE', 'SLV', 'SMR', 'SOM', 'SPM', 'SRB', 'SSD', 'STP', 'SUR', 'SVK', 'SVN', 'SWE', 'SWZ', 'SXM', 'SYC', 'SYR', 'TCA', 'TCD', 'TGO', 'THA', 'TJK', 'TKL', 'TKM', 'TLS', 'TON', 'TTO', 'TUN', 'TUR', 'TUV', 'TWN', 'TZA', 'UGA', 'UKR', 'UMI', 'URY', 'USA', 'UZB', 'VAT', 'VCT', 'VEN', 'VGB', 'VIR', 'VNM', 'VUT', 'WLF', 'WSM', 'XKX', 'YEM', 'ZAF', 'ZMB', 'ZWE', 'ZNC', 'UNK', 'SCG', 'XAD'])),
('start_time', Datetime(time_unit='ms', time_zone='UTC')),
('end_time', Datetime(time_unit='ms', time_zone='UTC')),
('gas', Enum(categories=['co2', 'ch4', 'n2o', 'co2e_100yr'])),
('sector',
Enum(categories=['agriculture', 'buildings', 'fluorinated-gases', 'forestry-and-land-use', 'fossil-fuel-operations', 'manufacturing', 'mineral-extraction', 'power', 'transportation', 'waste'])),
('subsector',
Enum(categories=['aluminum', 'bauxite-mining', 'biological-treatment-of-solid-waste-and-biogenic', 'cement', 'chemicals', 'coal-mining', 'copper-mining', 'crop-residues', 'cropland-fires', 'domestic-aviation', 'domestic-shipping', 'domestic-shipping-ship', 'domestic-wastewater-treatment-and-discharge', 'electricity-generation', 'enteric-fermentation-cattle-operation', 'enteric-fermentation-cattle-pasture', 'enteric-fermentation-other', 'fluorinated-gases', 'food-beverage-tobacco', 'forest-land-clearing', 'forest-land-degradation', 'forest-land-fires', 'glass', 'heat-plants', 'incineration-and-open-burning-of-waste', 'industrial-wastewater-treatment-and-discharge', 'international-aviation', 'international-shipping', 'international-shipping-ship', 'iron-and-steel', 'iron-mining', 'lime', 'manure-applied-to-soils', 'manure-left-on-pasture-cattle', 'manure-management-cattle-operation', 'manure-management-other', 'net-forest-land', 'net-shrubgrass', 'net-wetland', 'non-residential-onsite-fuel-usage', 'oil-and-gas-production', 'oil-and-gas-refining', 'oil-and-gas-transport', 'other-agricultural-soil-emissions', 'other-chemicals', 'other-energy-use', 'other-fossil-fuel-operations', 'other-manufacturing', 'other-metals', 'other-mining-quarrying', 'other-onsite-fuel-usage', 'other-transport', 'petrochemical-steam-cracking', 'pulp-and-paper', 'railways', 'removals', 'residential-onsite-fuel-usage', 'rice-cultivation', 'road-transportation', 'road-transportation-road-segment', 'rock-quarrying', 'sand-quarrying', 'shrubgrass-fires', 'soil-organic-carbon', 'solid-fuel-transformation', 'solid-waste-disposal', 'synthetic-fertilizer-application', 'textiles-leather-apparel', 'water-reservoirs', 'wetland-fires', 'wood-and-wood-products'])),
('emissions_quantity', Float64),
('emissions_quantity_units', Categorical(ordering='physical')),
('temporal_granularity',
Enum(categories=['annual', 'other', 'month', 'week', 'day', 'hour'])),
('created_date', Datetime(time_unit='ms', time_zone='UTC')),
('modified_date', Datetime(time_unit='ms', time_zone='UTC'))])
Another trick is to define all the Dataframe columns that we are going to manipulate.
For example, there is a column called gas
that refers to all the gas being tabulated.
Coming from the Pandas world, it would be normal to refer to this column:
cedf_pdf[cedf_pdf["gas"]=="ch4"]
Polars offers a more succint syntax by considering an abstract column, for example for the gas:
pl.col("gas")
That col("gas")
refers to any dataframe column called gas
. The select
method operates on a polars dataframe takes such columns as instructions to select specific columns.
cedf.select(pl.col("gas"));
The ctrace
library predefines all the columns defined in the Climate TRACE datasets, all prefixed by the c_
prefix:
ct.constants.c_gas
This reduces the chance of making a typo, allows us to use the autocompletion features of editors, and it will catch any mistake before executing the code. Selecting all the data related to CO2 is simply:
# cedf.filter(c_gas == CO2);
Using a different data processing framework
This notebook is using Polars. Are you more familiar with other frameworks such as pandas
, PySpark
, Modin
, DuckDB
, R’s dataframes? No problem. Polars can convert directly to most of these other representations. Here is an example for pandas below. You can also directly point to the underlying Parquet representation on the HuggingFace Hub of the project.
# Use pandas instead:
cedf_pandas = ct.read_country_emissions(GAS_LIST).to_pandas()
cedf_pandas.head(3)
iso3_country | start_time | end_time | gas | sector | subsector | emissions_quantity | emissions_quantity_units | temporal_granularity | created_date | modified_date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ABW | 2015-01-01 00:00:00+00:00 | 2015-12-31 00:00:00+00:00 | co2 | fossil-fuel-operations | other-fossil-fuel-operations | 0.000000 | NaN | annual | NaT | NaT |
1 | ABW | 2015-01-01 00:00:00+00:00 | 2015-12-31 00:00:00+00:00 | co2 | mineral-extraction | bauxite-mining | 0.000000 | NaN | annual | NaT | NaT |
2 | ABW | 2015-01-01 00:00:00+00:00 | 2015-12-31 00:00:00+00:00 | co2 | transportation | domestic-shipping | 90613.375994 | NaN | annual | NaT | NaT |
Country checks#
In this section, we identify the biggest emitters and sort out a few unexpected insights.
We focus this analysis on the year 2023 and on the aggregated emissions for carbon dioxyde (CO2). You can look at how the results change when you focus on another specific gas (co2
, nh4
, …) or the global warming potentials (co2e_100yr
)
gas = CO2 #gas = CO2E_100YR
year = 2023
cedf_gy = cedf.filter(c_gas == gas).filter(c_start_time.dt.year()==year)
First question: how much do we emit? About 50GT of CO2 (gigatonnes of CO2). Note the first wrinkle already: this takes into account the absorption by the vegetation. If we did not take into action the beneficial actions of our friends the trees, our emissions would be already higher. Trees will come back as a complicated topic.
Here are the net emissions, taking into account all the contribution of the forestry and land use sector: 55GT of CO2 equivalent
cedf_gy.select(c_emissions_quantity.sum())
emissions_quantity |
---|
f64 |
5.0609e10 |
Looking at the increase by year, we see that the emissions of CO2 are still increasing.
px.line(
cedf
.filter(c_emissions_quantity > 0)
.group_by(c_start_time.dt.year(), c_gas)
.agg(c_emissions_quantity.sum())
.sort(by=c_start_time),
x="start_time", y="emissions_quantity",
color="gas")
Most reports do not account for the forestry and land use, because it is very hard to measure and because there are still some debates on what exactly to report in that category. Excluding this category, we get to a amount of 46GT CO2 equivalent, which is what Climate TRACE reports on its website:
(cedf_gy
.filter(c_sector != FORESTRY_AND_LAND_USE)
.select(c_emissions_quantity.sum()))
emissions_quantity |
---|
f64 |
4.6174e10 |
For now, we will still include forestry and land use. You will see how much this can skew the figures and why it is at the center of many discussions.
And when splitting between sinks and sources
(
cedf_gy
.select([c_emissions_quantity])
.with_columns((c_emissions_quantity > 0).alias("is_source"))
.group_by("is_source")
.agg(c_emissions_quantity.sum())
.drop_nulls()
)
is_source | emissions_quantity |
---|---|
bool | f64 |
false | -1.5872e10 |
true | 6.6481e10 |
Look at the top emitters for all the emissions sources. The usual suspects come at the top (China, USA, Russia). However, it is very important to keep the orders of magnitude in mind: China has more emissions than the next 3 countries (USA, Russia, India) combined.
px.bar(cedf_gy
.filter(c_emissions_quantity > 0)
.group_by(c_iso3_country)
.agg(c_emissions_quantity.sum())
.sort([c_emissions_quantity], descending=True)
.head(20)
,x=ISO3_COUNTRY, y=EMISSIONS_QUANTITY,log_y=False)
Looking at each sector gives quickly a nuanced sector and shows the difference between the various countries.
the Chinese electricity sector emits more than Western Europe combined. This is not to say that Europeans should not make efforts - they should!. But these efforts can come from many ways, such as incentivizing China to reduce the role of coal in its domestic supply of electricity.
the emissions from the oil and gas sector in Russia is bigger than all the emissions of Germany. These emissions are typically quick wins (fixing leaks on old pipes).
Zimbabwe and Mozambique have as much gross emissions together as Japan! Clearing forested areas are very bad for the climate and have only short-term economic benefits. Finding fairer ways to preserve natural resources would also go a long way towards helping developing countries achieve their goals.
France is about twice as big as Russia looking at the economy (GDP), yet it contributes much less. This view emphasizes direct emissions (where gases are emitted), not the indirect emissions from consumption.
CTODO
Check these conclusions with experts
px.bar(cedf_gy
.group_by(c_iso3_country, c_subsector)
.agg(c_emissions_quantity.sum())
.sort([c_emissions_quantity], descending=True)
.filter(c_iso3_country.is_in([
"CHN", "USA", "IND", "RUS","MOZ","FRA", "NLD"]))
,x=ISO3_COUNTRY, y=EMISSIONS_QUANTITY,color=SUBSECTOR,log_y=False)
Source checks#
Let’s look now at all the sources tracked by Climate TRACE.
We load the data first.
Note
Technical
We load the data using
scan_parquet
. This instructs Polars to delay the actual loading in memory until requested. The amount of memory necessary will then be minimal, even if the dataset itself is 4GB.We reset the schema, in particular we add all the information about enumerations. This provides better type checks and helps Polars with optimizing its queries.
You can see that Polars has not done any processing if you attempt to display the data:
sdf_gy = ct.read_source_emissions(gas=gas, year=year)
sdf_gy
NAIVE QUERY PLAN
run LazyFrame.show_graph() to see the optimized version
How much emissions are released according to the source tracking? This number is significantly higher compared to what was estimated per country (57GT CO2 instead of 41GT CO2).
The forestry and land use category is notoriously hard to estimate, so we are going for now to leave it aside
(sdf_gy
.filter(c_sector != FORESTRY_AND_LAND_USE)
.select(c_emissions_quantity.sum())
.collect()
)
emissions_quantity |
---|
f64 |
5.7449e10 |
The biggest sources#
What are the biggest sources of emissions? Fossil fuel operations (the Permian bassin in Texas, oil fields in Russia) are amongst the largest sources. Agriculture also plays a huge role, especially Brazil and India.
(sdf_gy
.filter(c_emissions_quantity > 0)
.filter(c_sector != FORESTRY_AND_LAND_USE)
.group_by(c_source_id)
.agg(c_iso3_country.first(), c_sector.first(), c_subsector.first(), c_emissions_quantity.sum(), c_source_name.first())
.top_k(50, by=c_emissions_quantity)
.collect(streaming=True))
source_id | iso3_country | sector | subsector | emissions_quantity | source_name |
---|---|---|---|---|---|
u64 | enum | enum | enum | f64 | str |
1241948 | "CHN" | "buildings" | "residential-onsite-fuel-usage" | 5.9804e8 | "China" |
1242200 | "USA" | "buildings" | "residential-onsite-fuel-usage" | 4.6791e8 | "United States" |
3588448 | "RUS" | "fossil-fuel-operations" | "oil-and-gas-transport" | 2.5986e8 | "Russian Federation_West Siberi… |
10720469 | "BRA" | "agriculture" | "cropland-fires" | 2.2666e8 | "Brazil" |
10720302 | "IND" | "agriculture" | "cropland-fires" | 2.1886e8 | "India" |
… | … | … | … | … | … |
3597677 | "BRA" | "transportation" | "road-transportation" | 4.6394e7 | "São Paulo" |
3588663 | "IRQ" | "fossil-fuel-operations" | "oil-and-gas-production" | 4.6277e7 | "Iraq_Widyan - North Arabian Gu… |
3600747 | "USA" | "transportation" | "road-transportation" | 4.5956e7 | "New York" |
3588498 | "USA" | "fossil-fuel-operations" | "oil-and-gas-transport" | 4.5931e7 | "United States_Appalachian_Shal… |
3600726 | "USA" | "transportation" | "road-transportation" | 4.5479e7 | "Illinois" |
Number of tracked sources#
How many sources were tracked last year?
We get more than 1.2millions if the forestry and land use changes are included.
Technical:
Notice that we did not need to ingest the data in a database, and yet this query takes less than a second. Pretty good for a decently large dataset! Try to reproduce this analysis in Pandas and see what happens.
CTODO
The CT map seems to report 1.8M.
(sdf_gy.select(c_source_id.unique_counts())
.count()
.collect()
.item() # Using .item() to have a single number
)
1265375
Excluding them, we get the number reported officially by Climate TRACE (748k).
(sdf_gy.filter(c_sector != FORESTRY_AND_LAND_USE)
.select(c_source_id.unique_counts())
.count()
.collect()
.item()
)
749594
Which categories do these records come from?
This shows the full diversity of the sources to consider. The most numerous records concern ships, followed by various treatment plants for cities, forest statistics, etc.
px.bar(sdf_gy
.select(c_subsector.value_counts(sort=True))
.collect()
.unnest(SUBSECTOR),
x=SUBSECTOR, y="count", log_y=True
)
Looking at a few examples of wastewater treatment plants, you see how the data is organized for each source:
an identifier, along with country, UNFCCC sectorial categorization, time of data collection
the gas considered
the main methodology to get the emission: typically a certain quantity of interest, called activity (here a population) multiplied by an emission factor
more details about the source itself: name, type, position
some extra tabular information (always strings in this dataset)
qualitative confidence values for each of the values
(sdf_gy
.filter(c_sector == MANUFACTURING)
.filter(c_subsector == CEMENT)
.head(3)
.collect()
)
source_id | iso3_country | sector | subsector | original_inventory_sector | start_time | end_time | temporal_granularity | gas | emissions_quantity | emissions_factor | emissions_factor_units | capacity | capacity_units | capacity_factor | activity | activity_units | created_date | modified_date | source_name | source_type | lat | lon | other1 | other2 | other3 | other4 | other5 | other6 | other7 | other8 | other9 | other10 | other11 | other12 | other1_def | other2_def | other3_def | other4_def | other5_def | other6_def | other7_def | other8_def | other9_def | other10_def | other11_def | other12_def | geometry_ref | conf_source_type | conf_capacity | conf_capacity_factor | conf_activity | conf_emissions_factor | conf_emissions_quantity | year |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
u64 | enum | enum | enum | str | datetime[μs, UTC] | datetime[μs, UTC] | enum | enum | f64 | f64 | str | f64 | str | f64 | f64 | str | datetime[μs, UTC] | datetime[μs, UTC] | str | str | f64 | f64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | enum | enum | enum | enum | enum | enum | i64 |
1895560 | "CHN" | "manufacturing" | "cement" | null | 2023-05-01 00:00:00 UTC | 2023-05-31 00:00:00 UTC | "month" | "co2" | 44912.138625 | 0.54 | "t of CO2 per t of cement" | 212917.0 | "t of cement" | 0.390625 | 83170.627083 | "t of cement" | 2024-08-21 00:00:00 UTC | null | "Anyang Hubo Clinker Co Ltd Hen… | "integrated dry" | 36.085841 | 114.068585 | "0.5952506500000001" | "49507.36983219091" | "0.34" | "28278.013208292858" | "0.2" | "16634.125416642855" | "0.091" | "7568.527064572498" | "0.60715" | "satellite" | null | null | "Direct and Indirect emissions … | "Direct and Indirect emissions:… | "Calcination emissions factor (… | "Calcination emissions (t of CO… | "Fuel emissions factor (t of CO… | "Fuel emissions (t of CO2)" | "electricity_use_factor (MWh pe… | "electricity_use (MWh)" | "grid_emissions_intensity (t CO… | "model_methodology (e.g satelli… | null | null | "trace_1895560" | "high" | "high" | "low" | "low" | "low" | "low" | 2023 |
1895560 | "CHN" | "manufacturing" | "cement" | null | 2023-06-01 00:00:00 UTC | 2023-06-30 00:00:00 UTC | "month" | "co2" | 105667.939179 | 0.54 | "t of CO2 per t of cement" | 212917.0 | "t of cement" | 0.91905 | 195681.36885 | "t of cement" | 2024-08-21 00:00:00 UTC | null | "Anyang Hubo Clinker Co Ltd Hen… | "integrated dry" | 36.085841 | 114.068585 | "0.5973491099999999" | "116890.0915261292" | "0.34" | "66531.665409" | "0.1999999999999999" | "39136.27376999999" | "0.091" | "17807.00456535" | "0.63021" | "satellite" | null | null | "Direct and Indirect emissions … | "Direct and Indirect emissions:… | "Calcination emissions factor (… | "Calcination emissions (t of CO… | "Fuel emissions factor (t of CO… | "Fuel emissions (t of CO2)" | "electricity_use_factor (MWh pe… | "electricity_use (MWh)" | "grid_emissions_intensity (t CO… | "model_methodology (e.g satelli… | null | null | "trace_1895560" | "high" | "high" | "low" | "low" | "low" | "low" | 2023 |
1895560 | "CHN" | "manufacturing" | "cement" | null | 2023-07-01 00:00:00 UTC | 2023-07-31 00:00:00 UTC | "month" | "co2" | 17822.384777 | 0.54 | "t of CO2 per t of cement" | 212917.0 | "t of cement" | 0.155011 | 33004.416254 | "t of cement" | 2024-08-21 00:00:00 UTC | null | "Anyang Hubo Clinker Co Ltd Hen… | "integrated dry" | 36.085841 | 114.068585 | "0.59744466" | "19718.312247113456" | "0.34" | "11221.501526214286" | "0.2" | "6600.883250714286" | "0.091" | "3003.401879075" | "0.63126" | "satellite" | null | null | "Direct and Indirect emissions … | "Direct and Indirect emissions:… | "Calcination emissions factor (… | "Calcination emissions (t of CO… | "Fuel emissions factor (t of CO… | "Fuel emissions (t of CO2)" | "electricity_use_factor (MWh pe… | "electricity_use (MWh)" | "grid_emissions_intensity (t CO… | "model_methodology (e.g satelli… | null | null | "trace_1895560" | "high" | "high" | "low" | "low" | "low" | "low" | 2023 |
Looking now at emissions and gathering by country, we get a plot relatively similar to what we saw in the section above.
Again, France and the Netherlands barely appear in this graph, and Mozambique’s forest management practices have an enormous impact.
px.bar(
(sdf_gy
.group_by(c_iso3_country, c_subsector)
.agg(c_emissions_quantity.sum())
.sort([c_emissions_quantity], descending=True)
.filter(c_iso3_country.is_in(["CHN", "USA", "IND", "RUS","MOZ","FRA", "NLD"]))
.collect()
)
,x=ISO3_COUNTRY, y=EMISSIONS_QUANTITY,color=SUBSECTOR,log_y=False)
Looking by sector, it is easy to see why the forestry and land use sector is complex: it would dominate both in retention and in emission.
px.bar(
(sdf_gy
.group_by(c_sector, c_subsector)
.agg(c_emissions_quantity.sum())
.sort([c_emissions_quantity], descending=True)
.collect()
)
,x=SECTOR, y=EMISSIONS_QUANTITY,color=SUBSECTOR,log_y=False)
Conclusion#
In this section, we saw:
how to access the Climate TRACE dataset in a modern data processing tool (Polars)
how to produce high-level statistics per country and per sector
From the data, it should be clear that tackling emissions is a global issue, in which a few countries dominate. Some countries with large economic outputs (France, The Netherlands) can have minimal emissions, in part because they have low-emission means of producing electricity and they have outsourced their industries. Some others with comparably small economic outputs or populations (ex: Mozambique) have a disproportionate impact because of the changes occurring in their ecosystems.
See also
Exercise Take your country look at the sources of emissions and explore how the different gases provide different perspectives.