Methodology
1. Disclaimer
2.Methods
3. Schema
Environmental Score Review
The following provides an overview of the methodology used to compute the GED's indices. The World Bank team is preparing a comprehensive document with additional details to the methodology. The team will also work with a small cohort of local governments to further refine the overall methodology and develop country specific case studies and use cases.
Methods
2.1 Indicator Selection
We utilized a wide variety of geospatial, publicly available, and frequently updated datasets to construct indicators that make up the Green Economy Diagnostics Tool's scores. The scalable and modular nature of these datasets allow us to rapidly deploy the GED for any country on demand. They were also chosen based on the availability and comprehensiveness of their technical documentation and existing validation work.
Datasets were sought based on two themes: Economic Development and Environmental Performance.
Economic Development indicators are to be taken as proxies of traditional economic indicators such as the Gross Domestic Output (GDP) which can be infrequently updated and limited in spatial granularity.
Environmental Performance indicators were selected based on how they reflect a region's environmental conditions. They are further classified into three sub-domains: Air Pollution, Extreme Weather Conditions, and Green Space.
2.2 Principal Component Analysis
The GED's Economic and Environmental Scores are each composed of several sub-scores. Since these sub-scores are constructed using many different metrics, with the prospect of additional metrics being incorporated in the future, we cannot just combine them by simple additions. Instead, we use a method called Principal Component Analysis (PCA) that extracts the most relevant information from these metrics and combines them into indices to represent the above components. PCA is endorsed by the OECD as a recommended technique for constructing composite indicators. The values of the reduced indices from PCA are then normalized to a scale of 0-100.
2.3 Economic Score Overview
This section presents the methodology for calculating an economic score for different regions which aims to represent their levels of economic development. It is based on two factors, each measures economic development from different perspectives: utility and the built area coverage.
Here are the crucial components:
- 1. Utility: Comprised of nighttime luminosity and luminosity annual growth. They are computed from data measured using satellite imagery, and they serve as a useful proxy for electricity consumption and economic activity.
- 2. Built Area: We use satellite images to see how much of a region is covered by man-made structures and whether this is increasing or decreasing over time. The presence of man-made structures is a good indicator of economic activity
Finally, we use Principle Component Analysis to construct the above individual components of the Economic Score. we take the weighted average of the scored components/indices as the economic score. This score spans from 0 to 100, with higher values denoting a more robust and productive economy. This is indicative of a positive economic development in the region Conversely, lower scores could signal economic slowdown or stagnation in the region. The economic score thus gives us an overall snapshot of the region's economic health and trajectory.
2.3.1 Utility Score Calculations
- 1. Luminosity data processing: The raw luminosity data is gathered from the NASA Black Marble VIIRS dataset. It is, then, further processed using 505Economics' algorithm.
- 2. Zonal statistics: We take the sum of nighttime luminosity values of the pixels comprising the chosen region for every year with available data.
- 3. Luminosity Growth Rate Calculation: The luminosity growth rate is calculated as (Current Year Luminosity - Previous Year Luminosity) / Previous Year Luminosity.
- 4. Utility Score Calculation: The luminosity and luminosity growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Utility Score using Principal Component Analysis. The individual scores are then grouped by year and ranked relative to the 0-100 scale.
2.3.2 Built Area Calculations
- 1. Land cover data processing: The raw Dynamic World V1 land cover data is extracted using Google Earth Engine. The mode of the annual pixel classification value is taken.
- 2. Zonal Statistics: The percentage of the pixels labelled “built area” (human built structures) in a chosen region for a chosen year is calculated.
- 3. Built Area Growth Calculation: The Built Area Annual Growth is then calculated as (Current Year Built Area Coverage - Previous Year Built Area Coverage) / Previous Year Built Area Coverage.
- 4. Built Area Score Calculation: The Built Area Coverage and Built Area growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Built Area Score using Principal Component Analysis. The individual scores are then grouped by year and ranked relative to the 0-100 scale.
2.3.3 Economic Score Calculations
The Economic Score is calculated using the weighted average of a given region at a given year's Utility and Built Area Scores. The weights are set to be equal. The individual scores are then grouped by year and ranked relative to the 0-100 scale.
2.4 Environmental Score Overview
This section explains how an environmental score for different regions is calculated. This score gives an idea of how the environment is performing in a particular region, based on various factors like air quality, temperature, precipitation, and green spaces.
Here's are the most important components:
- 1. Air Quality: This is measured by looking at five different types of pollutants in the air. Each pollutant is scored on a scale of 0-500 (with higher scores indicating worse air quality). Then, the score for the worst pollutant is chosen to represent overall air quality. This number tells us about the air quality for every day in the year, and then we determine the percentage of days per day had good, moderate, or poor air quality.
- 2. Weather: We look at the temperature for every day in the past decade, find out the yearly change in the maximum recorded temperature, what's extremely hot (top 10%) and extremely cold(bottom 10%), and then see how many such days there were in each year. Similarly, we also keep track of the change in the maximum amount of rainfall in a year, and the proportion of extremely wet or dry days.
- 3. Green Space: We use satellite images to see how much of a region is covered by green spaces like forests, grass, and shrubs, and whether this is increasing or decreasing over time. We also take a look at the annual change in the region's built environment when there's a decrease in green space
Similarly to the components of the Economic Score, we use Principle Component Analysis to construct the individual components of the Environment Score. Their weighted average are then computed as the Environment Score, where higher scores represent a healthier environment. Some factors, like good air quality or more green spaces, increase the score. Others, like high levels of pollution or extreme temperatures, decrease the score, although the relationship is not always linear.
2.4.1 Air Quality Score
We first calculate the Air Quality Index (AQI) based on the measurements of five pollutants: PM2.5, NO2, CO, SO2, and O3. We create sub-indexes for each of these pollutants. The concentration measurements of these pollutants are in different units, so they are first converted into a common scale. The methodology for the calculation follow those defined by the United States Environmental Protection Agency (EPA).
1. PM2.5 Sub-Index Calculation: The PM2.5 values are measured in µg/m³. These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The PM2.5 Sub-Index value is a scaled representation of the PM2.5 concentration.2. NO2 Sub-Index Calculation: The NO2 values are expressed in parts per billion (PPB). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The NO2 sub-index is a scaled representation of the NO2 concentration.3. CO Sub-Index Calculation: The CO is expressed in mg / m3 (milligrams per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The CO Sub-Index is a scaled representation of the CO concentration.4. SO2 Sub-Index Calculation: The SO2 values are expressed in ug / m3 (micrograms per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The SO2 Sub-Index is a scaled representation of the SO2 concentration.5. O3 Sub-Index Calculation: O3 is measured in ug / m3 (micrograms per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The O3 Sub-Index is a scaled representation of the O3 concentration.6. Final AQI Calculation: The final Air Quality Index (AQI) is a single number summarizing air quality, calculated from five pollutant sub-indices: PM2.5, SO2, NO2, CO, and O3.- •Rule 1: The PM2.5 or PM10 sub-index must be available, as particulate matter significantly impacts human health.
- •Rule 2: At least three out of the five total sub-indices must be available to ensure the AQI reflects various pollutants.
The AQI calculation follows two rules:
If these conditions aren't met, the AQI is set as 'NaN' (i.e. it's a missing value), a marker indicating an undefined value.
The final AQI itself is calculated as the maximum value of these sub-indices. This means that the AQI reflects the level of the most problematic pollutant at that time.
For example, if the NO2 level is higher than the other pollutants, the NO2 sub-index will be the final AQI score. This is done because the health effects of the worst pollutant are considered to represent the overall air quality.
AQI values are on a scale from 0 to 500, where a higher value indicates poorer air quality with greater potential impact on human health. AQI scores are as follows:
- •0 to 50 represents good air quality
- •51 to 100 is satisfactory
- •101 to 200 is moderate
- •201 to 300 is poor
- •301 to 400 is very poor
- •401 to 500 is severe
For each year, we then count the number of days each sub-national region experiences for each level of AQI (i.e. good, satisfactory, moderate, poor, very poor or severe). The percentage of days of a given region in a given year that fall into each AQI category is calculated.
Finally, we compute the Air Quality Score by applying PCA to metrics representing the percentage of days a region experiences different levels of AQI and the annual growth rates of individual air pollutants (NO2, CO, SO2, O3, and PM 2.5). The individual scores are then grouped by year and ranked relative to the 0-100 range.
2.4.2 Extreme Weather Score
1. Extreme Temperatures:We take data on the average temperature of each region from 2000-2010 to determine the 90th and 10th percentile. We use this to identify ahot threshold (90th percentile) and a cold threshold (10th percentile).
For each year, we count the number of days in each region where the average temperature was above the hot threshold (90th percentile). The number of these extremely hot days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely hot temperatures. Similarly, for each year, we count the number of days in each region where the average temperature was below the cold threshold (10th percentile). The number of these extremely cold days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely cold temperatures. We also take the annual percentage changes of the percentage of number of hot and cold days.
We additionally compute the annual percentage change in the region's maximum temperature.
2. Extreme Precipitation:We take data on the average precipitation of each region from 2000-2010 to determine the 90th and 10th percentile. We use this to identify a wet threshold (90th percentile) and a dry threshold(10th percentile).
For each year, we count the number of days in each region where the average precipitation was above the wet threshold (90th percentile). The number of these extremely wet days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely high precipitation. Similarly, for each year, we count the number of days in each region where the average temperature was below the dry threshold (10th percentile). The number of these extremely dry days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely low precipitation. We also take the annual percentage changes of the percentage of number of wet and dry days.
We additionally compute the annual percentage change in the region's maximum precipitation.
3. Extreme Weather score calculation:We apply PCA on these metrics to produce the Extreme Weather score. The individual scores are then grouped by year and ranked relative to the 0-100 range.
2.4.3 Green Space Score
- 1. Land cover data processing: The raw Dynamic World V1 land cover data is extracted using Google Earth Engine. The mode of the annual pixel classification value is taken.
- 2. Zonal Statistics: The percentage of the pixels labelled which classify as “green space” (forest, shrubs and scrubs, grass) in a chosen region for a chosen year is calculated.
- 3. Green Space Growth Calculation: The Green Space Annual Growth is then calculated as (Current Year Green Space Coverage - Previous Year Green Space Coverage) / Previous Year Green Space Coverage.
- 4. Green Space Score Calculation: The Green Space Coverage, Green Space growth rate, and Built Area growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Built Area Score using Principal Component Analysis. Regions which have positive annual percentage changes in built coverage and a negative percentage changes in the green coverage has its built coverage value multiplied by -1. The individual scores are then grouped by year and ranked relative to the 0-100 scale.
2.4.4 Environmental Score Calculations
The Environment Score is calculated using the weighted average of a given region at a given year's Air Quality, Extreme Weather, and Green Space Scores. The weights are set to be equal. The individual scores are then grouped by year and ranked relative to the 0-100 scale.
We calculate the overall environmental score using the following variables:
These indicators are first standardized to have a mean of 0 and standard deviation of 1, and then principal component analysis (PCA) is used to combine them into their respective component scores. This is done to reduce dimensionality and deal with multicollinearity among the variables. After PCA, the weighted average of the component scores are computed as the environmental score.
Note that indicators which are harmful for environment (like high ozone or PM2.5 levels, or extreme temperatures) are multiplied by -1 before analysis so that they decrease, not increase, the environmental score.
Schema
Generative AI & Policy
Generative AI is the type or Artificial Intelligence which is capable of generating content. Generative AI is powered by large-data and AI foundation models built upon millions of valuable and reliable data. The models are then capable of multi-tasking, performing tasks which are not necessarily straightforward (e.g.: Summarization, Q&A, classification, analysis, recommendations, etc…) Currently, the infrastructure is available to be able to use these large models and adapting them for specific targeted use-cases based on personal/entity needs. Due to the large amount of resources available online, this requires minimal training data and thus the potential for the personalization of Generative AI services is vast. Still, using Generative AI for specific-use cases is a relatively new industry. It is constantly changing and the possibilities of its implementation becoming vaster day by day. However, this implies that working in this field also requires consistent adaptation and learning.
In terms of policy making, generative AI can be a catalyst for more efficient, inclusive and responsive government through many services such as:
- •Digitizing services
- •Efficiently analysing large amounts of data
- •Effectively summarizing large datasets and highlighting their implications
- •Summarizing location and subject-specific policy documents
Generative AI within the GED
We decided to leverage the GED capabilities and the capabilities of Generative AI to produce accurate, relevant, and useful insights and recommendations for a policy maker.
This would allow policy makers to:
- •Quickly make sense of the large amounts of data found in the GED
- •Focus on the most important indicators and their significance
- •Highlighting how the practical effectiveness of the GED indicators
- •Most importantly, derive accurate, relevant and specific policy recommendations based on these recommendations
So far, we have successfully produced a prototype (focusing on Serbia) that works on the municipality/district level and is capable of:
- Summarizing the current economic and the environmental score of the municipality
- Briefly describes the trends in the environmental score based on its three components
- Describing the provincial and the national ranking of each municipality based on its economic and environmental scores (i.e. How the economic and the environmental scores of the municipality rank compared to other municipalities in the same district and other municipalities all over the country.)
- Provide economic and environmental recommendation’s based on simple background information extracted about the municipality and based on the indicators described above.
What Technology do we currently use?
- •Open AI API (paid)
- •GooglePALM API (free trial for now, but not available across the EU)
- •Langchain
Projected Current Costs:
If you have any questions regarding the above methodologies, please contact leave feedback by following the link below