GED-LOGOHomeRelease NotesMethodologyGive feedback

Methodology

1

Environmental Score Review

The following provides an overview of the methodology used to compute the GED's indices. The World Bank team is preparing a comprehensive document with additional details to the methodology. The team will also work with a small cohort of local governments to further refine the overall methodology and develop country specific case studies and use cases.

2

Methods

2.1 Indicator Selection

We utilized a wide variety of geospatial, publicly available, and frequently updated datasets to construct indicators that make up the Green Economy Diagnostics Tool's scores. The scalable and modular nature of these datasets allow us to rapidly deploy the GED for any country on demand. They were also chosen based on the availability and comprehensiveness of their technical documentation and existing validation work.

Datasets were sought based on two themes: Economic Development and Environmental Performance.

Economic Development indicators are to be taken as proxies of traditional economic indicators such as the Gross Domestic Output (GDP) which can be infrequently updated and limited in spatial granularity.

Environmental Performance indicators were selected based on how they reflect a region's environmental conditions. They are further classified into three sub-domains: Air Pollution, Extreme Weather Conditions, and Green Space.

2.2 Principal Component Analysis

The GED's Economic and Environmental Scores are each composed of several sub-scores. Since these sub-scores are constructed using many different metrics, with the prospect of additional metrics being incorporated in the future, we cannot just combine them by simple additions. Instead, we use a method called Principal Component Analysis (PCA) that extracts the most relevant information from these metrics and combines them into indices to represent the above components. PCA is endorsed by the OECD as a recommended technique for constructing composite indicators. The values of the reduced indices from PCA are then normalized to a scale of 0-100.

2.3 Economic Score Overview

This section presents the methodology for calculating an economic score for different regions which aims to represent their levels of economic development. It is based on two factors, each measures economic development from different perspectives: utility and the built area coverage.
Here are the crucial components:

  • 1. Utility: Comprised of nighttime luminosity and luminosity annual growth. They are computed from data measured using satellite imagery, and they serve as a useful proxy for electricity consumption and economic activity.
  • 2. Built Area: We use satellite images to see how much of a region is covered by man-made structures and whether this is increasing or decreasing over time. The presence of man-made structures is a good indicator of economic activity

Finally, we use Principle Component Analysis to construct the above individual components of the Economic Score. we take the weighted average of the scored components/indices as the economic score. This score spans from 0 to 100, with higher values denoting a more robust and productive economy. This is indicative of a positive economic development in the region Conversely, lower scores could signal economic slowdown or stagnation in the region. The economic score thus gives us an overall snapshot of the region's economic health and trajectory.

2.3.1 Utility Score Calculations

  • 1. Luminosity data processing: The raw luminosity data is gathered from the NASA Black Marble VIIRS dataset. It is, then, further processed using 505Economics' algorithm.
  • 2. Zonal statistics: We take the sum of nighttime luminosity values of the pixels comprising the chosen region for every year with available data.
  • 3. Luminosity Growth Rate Calculation: The luminosity growth rate is calculated as (Current Year Luminosity - Previous Year Luminosity) / Previous Year Luminosity.
  • 4. Utility Score Calculation: The luminosity and luminosity growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Utility Score using Principal Component Analysis. The individual scores are then grouped by year and ranked relative to the 0-100 scale.

2.3.2 Built Area Calculations

  • 1. Land cover data processing: The raw Dynamic World V1 land cover data is extracted using Google Earth Engine. The mode of the annual pixel classification value is taken.
  • 2. Zonal Statistics: The percentage of the pixels labelled “built area” (human built structures) in a chosen region for a chosen year is calculated.
  • 3. Built Area Growth Calculation: The Built Area Annual Growth is then calculated as (Current Year Built Area Coverage - Previous Year Built Area Coverage) / Previous Year Built Area Coverage.
  • 4. Built Area Score Calculation: The Built Area Coverage and Built Area growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Built Area Score using Principal Component Analysis. The individual scores are then grouped by year and ranked relative to the 0-100 scale.

2.3.3 Economic Score Calculations

The Economic Score is calculated using the weighted average of a given region at a given year's Utility and Built Area Scores. The weights are set to be equal. The individual scores are then grouped by year and ranked relative to the 0-100 scale.

2.4 Environmental Score Overview

This section explains how an environmental score for different regions is calculated. This score gives an idea of how the environment is performing in a particular region, based on various factors like air quality, temperature, precipitation, and green spaces.
Here's are the most important components:

  • 1. Air Quality: This is measured by looking at five different types of pollutants in the air. Each pollutant is scored on a scale of 0-500 (with higher scores indicating worse air quality). Then, the score for the worst pollutant is chosen to represent overall air quality. This number tells us about the air quality for every day in the year, and then we determine the percentage of days per day had good, moderate, or poor air quality.
  • 2. Weather: We look at the temperature for every day in the past decade, find out the yearly change in the maximum recorded temperature, what's extremely hot (top 10%) and extremely cold(bottom 10%), and then see how many such days there were in each year. Similarly, we also keep track of the change in the maximum amount of rainfall in a year, and the proportion of extremely wet or dry days.
  • 3. Green Space: We use satellite images to see how much of a region is covered by green spaces like forests, grass, and shrubs, and whether this is increasing or decreasing over time. We also take a look at the annual change in the region's built environment when there's a decrease in green space

Similarly to the components of the Economic Score, we use Principle Component Analysis to construct the individual components of the Environment Score. Their weighted average are then computed as the Environment Score, where higher scores represent a healthier environment. Some factors, like good air quality or more green spaces, increase the score. Others, like high levels of pollution or extreme temperatures, decrease the score, although the relationship is not always linear.

2.4.1 Air Quality Score

We first calculate the Air Quality Index (AQI) based on the measurements of five pollutants: PM2.5, NO2, CO, SO2, and O3. We create sub-indexes for each of these pollutants. The concentration measurements of these pollutants are in different units, so they are first converted into a common scale. The methodology for the calculation follow those defined by the United States Environmental Protection Agency (EPA).

Air Quality Calculation Breadown
1. PM2.5 Sub-Index Calculation: The PM2.5 values are measured in µg/m³. These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The PM2.5 Sub-Index value is a scaled representation of the PM2.5 concentration.2. NO2 Sub-Index Calculation: The NO2 values are expressed in parts per billion (PPB). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The NO2 sub-index is a scaled representation of the NO2 concentration.3. CO Sub-Index Calculation: The CO is expressed in mg / m3 (milligrams per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The CO Sub-Index is a scaled representation of the CO concentration.4. SO2 Sub-Index Calculation: The SO2 values are expressed in ug / m3 (micrograms per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The SO2 Sub-Index is a scaled representation of the SO2 concentration.5. O3 Sub-Index Calculation: O3 is measured in ug / m3 (micrograms per cubic meter of air). These are then re-scaled to create a value between 0-500 (based on guidance from the EPA). The O3 Sub-Index is a scaled representation of the O3 concentration.6. Final AQI Calculation: The final Air Quality Index (AQI) is a single number summarizing air quality, calculated from five pollutant sub-indices: PM2.5, SO2, NO2, CO, and O3.

    The AQI calculation follows two rules:

  • Rule 1: The PM2.5 or PM10 sub-index must be available, as particulate matter significantly impacts human health.
  • Rule 2: At least three out of the five total sub-indices must be available to ensure the AQI reflects various pollutants.

If these conditions aren't met, the AQI is set as 'NaN' (i.e. it's a missing value), a marker indicating an undefined value.

The final AQI itself is calculated as the maximum value of these sub-indices. This means that the AQI reflects the level of the most problematic pollutant at that time.

For example, if the NO2 level is higher than the other pollutants, the NO2 sub-index will be the final AQI score. This is done because the health effects of the worst pollutant are considered to represent the overall air quality.

AQI values are on a scale from 0 to 500, where a higher value indicates poorer air quality with greater potential impact on human health. AQI scores are as follows:

  • 0 to 50 represents good air quality
  • 51 to 100 is satisfactory
  • 101 to 200 is moderate
  • 201 to 300 is poor
  • 301 to 400 is very poor
  • 401 to 500 is severe

For each year, we then count the number of days each sub-national region experiences for each level of AQI (i.e. good, satisfactory, moderate, poor, very poor or severe). The percentage of days of a given region in a given year that fall into each AQI category is calculated.

Finally, we compute the Air Quality Score by applying PCA to metrics representing the percentage of days a region experiences different levels of AQI and the annual growth rates of individual air pollutants (NO2, CO, SO2, O3, and PM 2.5). The individual scores are then grouped by year and ranked relative to the 0-100 range.

2.4.2 Extreme Weather Score

1. Extreme Temperatures:

We take data on the average temperature of each region from 2000-2010 to determine the 90th and 10th percentile. We use this to identify ahot threshold (90th percentile) and a cold threshold (10th percentile).

For each year, we count the number of days in each region where the average temperature was above the hot threshold (90th percentile). The number of these extremely hot days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely hot temperatures. Similarly, for each year, we count the number of days in each region where the average temperature was below the cold threshold (10th percentile). The number of these extremely cold days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely cold temperatures. We also take the annual percentage changes of the percentage of number of hot and cold days.

We additionally compute the annual percentage change in the region's maximum temperature.

2. Extreme Precipitation:

We take data on the average precipitation of each region from 2000-2010 to determine the 90th and 10th percentile. We use this to identify a wet threshold (90th percentile) and a dry threshold(10th percentile).

For each year, we count the number of days in each region where the average precipitation was above the wet threshold (90th percentile). The number of these extremely wet days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely high precipitation. Similarly, for each year, we count the number of days in each region where the average temperature was below the dry threshold (10th percentile). The number of these extremely dry days is then divided by the total number of days to get a ratio, representing the proportion of the year with extremely low precipitation. We also take the annual percentage changes of the percentage of number of wet and dry days.

We additionally compute the annual percentage change in the region's maximum precipitation.

3. Extreme Weather score calculation:

We apply PCA on these metrics to produce the Extreme Weather score. The individual scores are then grouped by year and ranked relative to the 0-100 range.

2.4.3 Green Space Score

  • 1. Land cover data processing: The raw Dynamic World V1 land cover data is extracted using Google Earth Engine. The mode of the annual pixel classification value is taken.
  • 2. Zonal Statistics: The percentage of the pixels labelled which classify as “green space” (forest, shrubs and scrubs, grass) in a chosen region for a chosen year is calculated.
  • 3. Green Space Growth Calculation: The Green Space Annual Growth is then calculated as (Current Year Green Space Coverage - Previous Year Green Space Coverage) / Previous Year Green Space Coverage.
  • 4. Green Space Score Calculation: The Green Space Coverage, Green Space growth rate, and Built Area growth rate are standardized (subtracting the mean and scaling to unit variance) and transformed into the Built Area Score using Principal Component Analysis. Regions which have positive annual percentage changes in built coverage and a negative percentage changes in the green coverage has its built coverage value multiplied by -1. The individual scores are then grouped by year and ranked relative to the 0-100 scale.

2.4.4 Environmental Score Calculations

The Environment Score is calculated using the weighted average of a given region at a given year's Air Quality, Extreme Weather, and Green Space Scores. The weights are set to be equal. The individual scores are then grouped by year and ranked relative to the 0-100 scale.

We calculate the overall environmental score using the following variables:

Indicator
Component
Description
Eff on Env Score
Data Type
Range of Values
GOOD
Air Score
Percentage of number of days of air quality deemed “Good” in the given year
Positive
float
0-1.0
SATISFACTORY
Air Score
Percentage of number of days of air quality deemed “Satisfactory” in the given year
Positive
float
0-1.0
MODERATE
Air Score
Percentage of number of days of air quality deemed “Moderate” in the given year
Positive
float
0-1.0
POOR
Air Score
Percentage of number of days of air quality deemed “Poor” in the given year
Negative
float
0-1.0
VERY POOR
Air Score
Percentage of number of days of air quality deemed “Very Poor” in the given year
Negative
float
0-1.0
PM25_PCT_CHANGE
Air Score
Annual percentage change of the given region’s PM2.5
Negative (a positive increase reduces the score)
float
-inf-inf
NO2_PCT_CHANGE
Air Score
Annual percentage change of the given region’s NO2
Negative (a positive increase reduces the score)
float
-inf-inf
SO2_PCT_CHANGE
Air Score
Annual percentage change of the given region’s SO2
Negative (a positive increase reduces the score)
float
-inf-inf
CO_PCT_CHANGE
Air Score
Annual percentage change of the given region’s CO
Negative (a positive increase reduces the score)
float
-inf-inf
O3_PCT_CHANGE
Air Score
Annual percentage change of the given region’s O3
Negative (a positive increase reduces the score)
float
-inf-inf
EXTREMELY_HOT
Temperature Score
Proportion of extremely hot days in a year
Negative (more extremely hot days reduce the score)
float
0-1.0
EXTREMELY_HOT_PCT_CHANGE
Temperature Score
Annual percentage change of the proportion of extremely hot days in a year
Negative (a positive increase reduces the score)
float
-inf-inf
EXTREMELY_COLD
Temperature Score
Proportion of extremely cold days in a year
Negative (more extremely cold days reduce the score)
float
0-100
EXTREMELY_COLD_PCT_CHANGE
Temperature Score
Annual percentage change of the proportion of extremely cold days in a year
Negative (a positive increase reduces the score)
float
-inf-inf
MAX_TEMP_PCT_CHANGE
Temperature Score
Annual percentage change of the maximum temperature recorded in a year
Negative (a positive increase reduces the score)
float
-inf-inf
PRECIPITATION_MAX_PCT_CHANGE
Temperature Score
Annual percentage change of the maximum amount of precipitation in a year
Negative (a positive increase reduces the score)
float
-inf-inf
EXTREMELY_WET
Temperature Score
Proportion of extremely wet days in a year
Negative (more extremely wet days reduce the score)
float
0-1.0
EXTREMELY_WET_PCT_CHANGE
Temperature Score
Annual percentage change of the proportion of extremely wet days in a year
Negative (a positive increase reduces the score)
float
-inf-inf
EXTREMELY_DRY
Temperature Score
Proportion of extremely dry days in a year
Negative (more extremely dry days reduce the score)
float
0-1.0
EXTREMELY_DRY_PCT_CHANGE
Temperature Score
Annual percentage change of the proportion of extremely dry days in a year
Negative (a positive increase reduces the score)
float
-inf-inf
GREEN_PCT
Forest Score
Percentage of green space in a given region
Positive (greater percentage of green space contributes to a higher score)
float
0-100
GREEN_COVER_GROWTH
Forest Score
Annual growth in the green space
Positive (increase in green cover contributes to a higher score)
float
-inf-inf
BUILT_PCT
Forest Score
Percentage of built space in a given region
Negative if there’s a decrease in GREEN_COVER_GROWTH, otherwise 0
float
-100-100
BUILT_COVER_GROWTH
Forest Score
Annual growth in the built space
Negative if there’s a decrease in GREEN_COVER_GROWTH, otherwise 0
float
-inf-inf

These indicators are first standardized to have a mean of 0 and standard deviation of 1, and then principal component analysis (PCA) is used to combine them into their respective component scores. This is done to reduce dimensionality and deal with multicollinearity among the variables. After PCA, the weighted average of the component scores are computed as the environmental score.

Note that indicators which are harmful for environment (like high ozone or PM2.5 levels, or extreme temperatures) are multiplied by -1 before analysis so that they decrease, not increase, the environmental score.

3

Schema

Indicator
Description
Data Type
Range of Values
Admin Level
Full / Reduced
PROVINCE
Official name of given Province or Administrative level 1
str
N/A (depends on chosen country)
Both
Both
PROVINCE_ID
Administrative level 1 ID
str
N/A (depends on chosen country)
Both
Both
DISTRICT
Official name of given District or Administrative level 2
str
N/A (depends on chosen country)
2
Both
DISTRICT_ID
Administrative level 2 ID
str
N/A (depends on chosen country)
2
Both
YEAR
Year
int
2019-2022
Both
Both
GOOD
Percentage of number of days of air quality deemed “Good” in the given year
float
0-1.0
Both
Full
SATISFACTORY
Percentage of number of days of air quality deemed “Satisfactory” in the given year
float
0-1.0
Both
Full
MODERATE
Percentage of number of days of air quality deemed “Moderate” in the given year
float
0-1.0
Both
Full
POOR
Percentage of number of days of air quality deemed “Poor” in the given year
float
0-1.0
Both
Full
VERY POOR
Percentage of number of days of air quality deemed “Very Poor” in the given year
float
0-1.0
Both
Full
PM25_PCT_CHANGE
Annual percentage change of the given region’s PM2.5
float
-inf-inf
Both
Full
NO2_PCT_CHANGE
Annual percentage change of the given region’s NO2
float
-inf-inf
Both
Full
SO2_PCT_CHANGE
Annual percentage change of the given region’s SO2
float
-inf-inf
Both
Full
CO_PCT_CHANGE
Annual percentage change of the given region’s CO
float
-inf-inf
Both
Full
O3_PCT_CHANGE
Annual percentage change of the given region’s O3
float
-inf-inf
Both
Full
EXTREMELY_HOT
Proportion of extremely hot days in a year
float
0-1.0
Both
Full
EXTREMELY_HOT_PCT_CHANGE
Annual percentage change of the proportion of extremely hot days in a year
float
-inf-inf
Both
Full
EXTREMELY_COLD
Proportion of extremely cold days in a year
float
0-100
Both
Full
EXTREMELY_COLD_PCT_CHANGE
Annual percentage change of the proportion of extremely cold days in a year
float
-inf-inf
Both
Full
MAX_TEMP_PCT_CHANGE
Annual percentage change of the maximum temperature recorded in a year
float
-inf-inf
Both
Full
PRECIPITATION_MAX_PCT_CHANGE
Annual percentage change of the maximum amount of precipitation in a year
float
-inf-inf
Both
Full
EXTREMELY_WET
Proportion of extremely wet days in a year
float
0-1.0
Both
Full
EXTREMELY_WET_PCT_CHANGE
Annual percentage change of the proportion of extremely wet days in a year
float
-inf-inf
Both
Full
EXTREMELY_DRY
Proportion of extremely dry days in a year
float
0-1.0
Both
Full
EXTREMELY_DRY_PCT_CHANGE
Annual percentage change of the proportion of extremely dry days in a year
float
-inf-inf
Both
Full
GREEN_PCT
Percentage of green space in a given region
float
0-100
Both
Full
GREEN_COVER_GROWTH
Annual growth in the green space
float
-inf-inf
Both
Full
BUILT_PCT
Percentage of built space in a given region
float
-100-100
Both
Full
BUILT_COVER_GROWTH
Annual growth in the built space
float
-inf-inf
Both
Full
AIR_SCORE
Composite Score of Air Quality indicators
float
0-100
Both
Both
FOREST_SCORE
Composite score of deforestation indicators
float
0-100
Both
Both
TEMP_SCORE
Composite score of weather/temperature indicators
float
0-100
Both
Both
ENVR_SCORE
Composite score of AIR_SCORE, TEMP_SCORE, and FOREST_SCORE
float
0-100
Both
Both
LUMINOSITY
Sum of nighttime luminosity of the region
float
0-inf
Both
Full
POPULATION
Sum of the region’s population
int
0-inf
Both
Full
LPC
luminosity per capita
float
>0
Both
Full
LPC_PCT_CHANGE
Annual percentage change of the luminosity per capita
float
-inf-inf
Both
Full
ECON_SCORE
Economic Score
float
0-100
Both
Both
ECON_LPC_STD
z-score of LPC
float
-inf-inf
Both
Both
ECON_LPC_PCT_CHANGE_STD
z-score of LPC_CHANGE_STD
float
-inf-inf
Both
Both
AIR_PM25_SUBINDEX_STD
z-score of PM25_SUBINDEX
float
-inf-inf
Both
Both
AIR_NO2_SUBINDEX_STD
z-score of NO2_SUBINDEX
float
-inf-inf
Both
Both
AIR_CO_SUBINDEX_STD
z-score of CO_SUBINDEX
float
-inf-inf
Both
Both
AIR_SO2_SUBINDEX_STD
z-score of SO2_SUBINDEX
float
-inf-inf
Both
Both
AIR_O3_SUBINDEX_STD
z-score of O3_SUBINDEX
float
-inf-inf
Both
Both
TEMP_EXTREMELY_HOT_STD
z-score of EXTREMELY_HOT
float
-inf-inf
Both
Both
TEMP_EXTREMELY_COLD_STD
z-score of EXTREMELY_COLD
float
-inf-inf
Both
Both
TEMP_MAX_TEMP_STD
z-score of MAX_TEMP
float
-inf-inf
Both
Both
TEMP_PRECIPITATION_MAX_STD
z-score of PRECIPITATION_MAX
float
-inf-inf
Both
Both
TEMP_EXTREMELY_WET_STD
z-score of EXTREMELY_WET
float
-inf-inf
Both
Both
TEMP_EXTREMELY_DRY_STD
z-score of EXTREMELY_DRY
float
-inf-inf
Both
Both
FOREST_GREEN_PCT_STD
z-score of GREEN_PCT
float
-inf-inf
Both
Both
FOREST_GREEN_COVER_GROWTH_STD
z-score of GREEN_COVER_GROWTH
float
-inf-inf
Both
Both
ECON_BUILT_PCT_STD
z-score of BUILT_PCT_STD
float
-inf-inf
Both
Both
ECON_BUILT_COVER_GROWTH_STD
z-score of BUILT_COVER_GROWTH
float
-inf-inf
Both
Both
4

Generative AI & Policy

Generative AI is the type or Artificial Intelligence which is capable of generating content. Generative AI is powered by large-data and AI foundation models built upon millions of valuable and reliable data. The models are then capable of multi-tasking, performing tasks which are not necessarily straightforward (e.g.: Summarization, Q&A, classification, analysis, recommendations, etc…) Currently, the infrastructure is available to be able to use these large models and adapting them for specific targeted use-cases based on personal/entity needs. Due to the large amount of resources available online, this requires minimal training data and thus the potential for the personalization of Generative AI services is vast. Still, using Generative AI for specific-use cases is a relatively new industry. It is constantly changing and the possibilities of its implementation becoming vaster day by day. However, this implies that working in this field also requires consistent adaptation and learning.

In terms of policy making, generative AI can be a catalyst for more efficient, inclusive and responsive government through many services such as:

  • Digitizing services
  • Efficiently analysing large amounts of data
  • Effectively summarizing large datasets and highlighting their implications
  • Summarizing location and subject-specific policy documents
5

Generative AI within the GED

We decided to leverage the GED capabilities and the capabilities of Generative AI to produce accurate, relevant, and useful insights and recommendations for a policy maker.

This would allow policy makers to:

  • Quickly make sense of the large amounts of data found in the GED
  • Focus on the most important indicators and their significance
  • Highlighting how the practical effectiveness of the GED indicators
  • Most importantly, derive accurate, relevant and specific policy recommendations based on these recommendations

So far, we have successfully produced a prototype (focusing on Serbia) that works on the municipality/district level and is capable of:

  • Summarizing the current economic and the environmental score of the municipality
  • Briefly describes the trends in the environmental score based on its three components
  • Describing the provincial and the national ranking of each municipality based on its economic and environmental scores (i.e. How the economic and the environmental scores of the municipality rank compared to other municipalities in the same district and other municipalities all over the country.)
  • Provide economic and environmental recommendation’s based on simple background information extracted about the municipality and based on the indicators described above.

What Technology do we currently use?

  • Open AI API (paid)
  • GooglePALM API (free trial for now, but not available across the EU)
  • Langchain

Projected Current Costs:

Per Municipality
Data For the whole of Serbia (160 Municipalities)
Economic Trend Description
$0.02
$3.20
Economic Trend Description
$0.031
$4.96
Rankings
$0.04
$6.40
Recommendations
$0.002
$0.32
Total
$0.093
$14.88

If you have any questions regarding the above methodologies, please contact leave feedback by following the link below