The former Division of Government Research (DGR) at UNM developed a special purpose gravity model for measuring geographic access to health care facilities and providers in New Mexico. This work was performed for the former New Mexico Health Policy Commission (NM HPC) from 1988 through 2002. The results of this preliminary work were only published on DGR's former web page and also in a limited distribution publication by the NM HPC ( HPC Quick Facts 2003 - color extract). A special poster presentation was also prepared that won the poster contest at the 2002 ESRI SWUG Conference held in Taos, New Mexico ( now Esri Southwest User Conference).
The gravity model was very innovative and potentially useful in many other places beyond New Mexico where having a higher resolution, more accurate and reliable measure of geographic accessibility to health care would benefit policy decisions. This research will continue the development of this gravity model (see the Story Map) and also compare it with results obtained from recently developed techniques used by other researchers. The original gravity model was developed using (SAS) and ESRI's ArcGIS. These updates will be done using ArcGIS ModelBuilder, Python scripts, R, and now with ArcGIS Pro.
Note: Most of the effort so far has concentrated on technical development, learning and applying various computing programming and statistical software facilities. The resulting models and scripts will eventually be made available for other researchers to use and modify. Hopefully these technical developments will clearly demonstrate the use of this improving GIS technology and encourage state agencies to resolve data quality and data sharing issues. Esri has also been encouraging government agencies to participate in their Open Data program which hopefully may result in more New Mexico data becoming available for public use in the future. For more recent official information about the status of New Mexico's health care workforce please see New Mexico Health Care Workforce Committee Annual Reports. Also see Modeling Future Health Care Workforce Adequacy to Inform Policy.
I recently made progress developing an ArcGIS ModelBuilder version of this gravity model using ZIP Codes and older provider data (previously available). For now, there is a map (PDF) of the preliminary results of Population per Primary Care Physician plus a (PDF) showing the design and components used for ArcGIS ModelBuilder available below. Most recently, I have begun the process of developing a Python script tool and there is documentation with screen shots available below that will be updated as this process continues (also see the class presentation). Currently, the Python script tool works well except that I am having problems fully automating the symbology with user defined input for the class breaks. For now, the last step is a manual process where symbology can be modified from the default natural breaks classification. I hope to resolve this problem with ESRI technical support in the future.
The results still need to be verified for accuracy by comparing them to those obtained using the existing SAS Macro developed at DGR. However, I no longer have access to a Windows version of SAS as I had for many years as a UNM staff employee. At UNM there is no student version of SAS available for installation on personal computers like at many other universities such as PSU, BU, Texas A&M, and Stanford. The Windows version is only available for student use in a few computer pods and for departmental use purchased with a UNM account. There is a Linux version, but disk space quotas limit its utility. SAS recently relased a new University Edition that will allow comparision, but it lacks the SAS Bridge that works with the Windows version that would be very helpful. I still hope to use the SAS Bridge in combination with ArcGIS ModelBuilder and Python scripts (add-ins). I'm not sure yet which will prove to be more efficient. I still intend to make the ArcGIS ModelBuilder and Python scripts available sometime in the future once properly documented and packaged for others to use.
I still plan some more developments using census tracts and geographic units other than ZIP Codes such as the New Mexico Small Areas developed by the New Mexico Department of Health (see the Story Map - first version). Hopefully, geocoded provider and facility data can be obtained in the future that would allow comparision of results with other established techniques such as IDW, Kernel Density and the two-step floating catchment area method (2SFCA). Also obtaining recent provider and facility data for the border areas of states adjacent to New Mexico (Arizona, Colorado, and Texas) would provide more reliable results as many patients cross the state boundary and some also go to Mexico for care. The boundary/edge problem is a very common methodological problem that should be addressed in the proper design of spatial analysis applications. Further, if a good New Mexico road network database were developed for use in the ArcGIS Network Analyst , actual travel distance and time on the roads can be used instead of Euclidean distances. It would be interesting to compare results and see if any major improvements become apparent given the effort involved in creating a good road network dataset for the state. However, this effort is sorely needed for many other network based applications beyond just those for healthcare.
Given some minor problems that still need to be resolved with ArcGIS (mostly automating symbology) I have begun exploring using QGIS and Python for the health care gravity model. A general Python version of the gravity model is currently being developed that will allow for the use of both ZIP Codes, census tracts, and the New Mexico (NMDOH) Small Areas for any type of health care providers and facilities. I hope to have versions that that will work in both ArcGIS and QGIS. I will post results as I become more familiar with this open source GIS in the future.
I have also tested a revised version of the SAS gravity model using census tracts and the New Mexico (NMDOH) Small Areas with 2002 population estimates and 2002 primary care physician data (see Small Area Map and Census Tract Map). Note: A spatial join was used to assign ZIP Code physician data to census tracts and NMDOH small area. In the future, geocoded physician locations would be more accurate. However, given the interpolation performed by the gravity model I think the results would not be very different. Some of these results will be available on ArcGIS Online in several GIS formats (interactive web maps, map package, map services, map layers, and shapefiles) as development continues. I was recently provided with an organizational account when I renewed my home use license for ArcGIS which I never had before. Thanks esri!
I have prepared a Python (GDAL/OGR) script that reads a shapefile, calculates the gravity model, and writes a new shapefile with original data and results. I have mostly finished converting this python script into a QGIS processing script to allow for more user interaction and eventually automated display of results (see screenshot of curently working user interface and documentation). I may also consider the option of preparing a QGIS graphic modeler once the QGIS processing script is completed. I have also begun developing a QGIS plugin (see documentation) based on these scripts that will be made available for public use (see the QGIS Plugin Repository). The shared plugin will be generic (could be used in places other than New Mexico). For now only a screenshot of the results from QGIS are available until the plugin is completed. In addition, more developments using Ubuntu-Linux and other open source geospatial software are planned for the future. Note: Although QGIS developments are being prepared using Ubuntu-Linux they will also work on the Windows and Mac versions (see screenshot of the Windows processing script user interface). Links below with these developments will be posted soon!.
As I now have working Python code QGIS(PyQGIS) for the DGR gravity model I have also developed a mostly Python script (see documentation) using ArcGIS(ArcPy) and have started developing a R function. My earlier development of an ArcGIS ModelBuilder application and Python script tool derived from ModelBuilder proved too cumbersome and slow although it worked well. As both ArcGIS (R-ArcGIS) and QGIS (external application) have developed improved interfaces for R, now is a good time to do some more development with this widely used open source statistical software.
I have derived an OD Cost Matrix for all of the New Mexico census tracts using the Origin Destination Cost Matrix Tool. The DGR gravity model can now be applied with road based distances. For now, I will use the 2002 primary care physician data that was previously available during the earlier work performed by DGR for the New Mexico Health Policy Commission (NM HPC) from 1988 through 2002. Although this previous work was based on ZIP Codes, recent adjustments ( spatial join) have been performed (see the Story Map) to use census tracts and 2002 population estimates from UNM GPS. These results are very informative (see 2002 Gravity Model - Road Distance Map). The census tract test case results can be further modified to disregard county boundaries using the ArcGIS Inverse Distancw Weighted Tool (see IDW - Road Distance Map), but this smoother statistical surface should be interpreted with more caution. A Web Map and Web Map Application). are also available with these results. If the recent physician data for census tracts (499) or ZIP Codes (426) are eventually made available, the DGR gravity model results can provide a higher resolution and more useful picture of geographic accessibility.
Begun to focus on the primary research goal, to statistically compare various geographic access and gravity models (DGR, 2SFCA, and E2SFCA) at various levels of geography including those with higher resolution than New Mexico counties (by census tract, ZIP Code, and perhaps census block group). Also to compare the results for each model using both Euclidean and road based distances. Preliminary results have been informative and somewhat suprising. There are pronounced differences ( measured using both relative difference and correlation coefficent) between the results when comparing Euclidean and road distances for the DGR gravity model at the county (r = 0.8088695) and census tract (r = 0.6584075) levels of geography (see 2015 County Relative Difference Map or 2002 Census Tract Relative Difference Map or Web Map or Web Map Application). Obviously accessibility depends on where you are located with respect to the network of roads. Assumming equal accessibility as measured by Euclidean distance (isotrophic surface) is not very realistic. The geography of New Mexico supports this, although I did not think the differences woud be as large as they are in some places.
I have started using the Two-Step Floating Catchment Area (2SFCA) methods with the ArcGIS Desktop (ArcPy) script tools provided by Dr. Fahui Wang ( Wang, 2017). A Web Map showing results for the generalized 2SFCA method has been prepared using the 2002 primary care physician data that was previously available during the earlier work performed by DGR for the New Mexico Health Policy Commission (NM HPC) from 1988 through 2002. This class provided an excellent review and more in-depth background for various Python modules including Python - GeoPandas. I have re-programmed the generalized two-step floating catchment area methods using both SAS and Python3 - GeoPandas within Jupyter Notebook (see class PowerPoint Presentation). However, I did not make as much progress as I would have liked on the statistical comparison portion of my research project and more work will continue this summer and during the fall semester.
Exploratory Data Analysis: Some very preliminary results and data visualizations are now available that have been prepared using the SAS University Edition. These results (see summary statistics PDF) suggest that there is a significant difference in the results obtained from the 2SFCA methods compared with the 1SHGM method. However, these results based on a standard ANOVA should be interpreted with caution as several of the basic ANOVA assumptions are not met, primarily the independence of the data that are spatially autocorrelated and also not normally distributed plus non-common variances. More work is currently underway to research and apply more appropriate statistical methods. These additional statistical methods within R will be used to compare results from various models (one-step vs. two-step with various distance decay functions) and will be available as progress is made (see R Results). In addition, both absolute and relative differences will be calculated to investigate how results vary at each census tract location (see Preliminary Results (1SHGM and 2SFCA) Web Mapping Application - Note: a more efficient version is being prepared using the ArcGIS API for JavaScript).
Hybrid-Zonal Model Results with Various Distance Decay Functions: The boxplot (below) shows the distribution of Physicians per Population (PHYS_PER_POP) from the various models (Acc Methods - Accessibility Methods), all using a hybrid-zonal approach with various distance decay functions (1SED - 1SHGM with DGR type power distance decay; 2SEE - 2SFCA with exponential distance decay; 2SEG - 2SFCA with gaussian distance decay; 2SEP - 2SFCA with power distance decay). Note: the means and medians are close for the 2SFCA and only slightly different than the 1SHGM. It may be that the ANOVA results (can reject the null hypothesis that the means of the different methods are the same p < 0.0001 and accept the alternative hypothesis that at least one mean is not equal to the others) have been greatly influenced by the different variances and presence of extreme outliers.
Power Based Distance Decay Results: This boxplot (below) and histograms and boxplot PDF. shows the distribution of Physicians per Population (PHYS_PER_POP) from various models (Acc Methods - Accessibility Methods), using a hybrid-zonal approach with power based distance decay functions and also the distribution using the county based service area approach (COSVAR). This visual comparison indicates that the one-step methods (1SED and 1SEP) tend to produce more similar results to the widely used county based service area method (COSVAR) than a reasonably comparable two-step method (2SEP). An interactive web map has been prepared showing these results and the relative difference between the one-step and two-step methods at each census tract. There are also some special web mapping applications (see 1SEP and 2SEP, 1SED and 2SEP, and 1SED and 1SEP) that are more useful for comparing these results side-by-side. Note: The county based service area method (COSVAR) is only a population-to-physician ratio, it's the service capacity standard or "rational service area" used by the U.S. Department of Health and Human Services (DHHS) for defining physician shortage areas.
Exponential Distance Decay Results: This boxplot (below) and histograms and boxplot PDF. shows the distribution of Physicians per Population (PHYS_PER_POP) from various models (Acc Methods - Accessibility Methods), using a hybrid-zonal approach with exponential based distance decay functions and also the distribution using the county based service area approach (COSVAR). This visual comparison indicates that the one-step method (1SEE) tends to produce more similar results to the widely used county based service area method (COSVAR) than a reasonably comparable two-step method (2SEE). An interactive web map has been prepared showing these results and the relative difference between the one-step and two-step methods at each census tract. There is also a web mapping application that is more useful for comparing these results side-by-side. Note: The county based service area method (COSVAR) is only a population-to-physician ratio, it's the service capacity standard or "rational service area" used by the U.S. Department of Health and Human Services (DHHS) for defining physician shortage areas.
Gaussian Distance Decay Results: This boxplot (below) and histograms and boxplot PDF. shows the distribution of Physicians per Population (PHYS_PER_POP) from various models (Acc Methods - Accessibility Methods), using a hybrid-zonal approach with Gaussian based distance decay functions and also the distribution using the county based service area approach (COSVAR). This visual comparison indicates that the one-step method (1SEG) tends to produce more similar results to the widely used county based service area method (COSVAR) than a reasonably comparable two-step method (2SEG). An interactive web map has been prepared showing these results and the relative difference between the one-step and two-step methods at each census tract. There is also a web mapping application that is more useful for comparing these results side-by-side. Note: The county based service area method (COSVAR) is only a population-to-physician ratio, it's the service capacity standard or "rational service area" used by the U.S. Department of Health and Human Services (DHHS) for defining physician shortage areas.
Summary of Results: My initial goal for this research project was to compare the one-step and two-step models given an assumption or hypothesis that there would not be any significant differences as both are essentially gravity models. However, I have been surprised by the results. There is clear evidence of a significant difference between the one-step and two-step models that have similar constructs (hybrid-zonal) and employ the same distance decay methods (power, exponential, or Gaussian).
It is important to note that there is no indication that either model is a better measure of reality than the other. This can perhaps only be determined when actual observational provider data records are reviewed, or statistical patient-based surveys are conducted . As these models are only approximations of reality, a given model can only be evaluated as somewhat more appropriate or suitable than the other given the availability and quality of input data, plus how well it matches the desired usage or purpose.
This project has been an exceptional learning exercise that has enabled me to gain a better understanding of statistical, spatial, and GIS software and techniques. I have focused on presenting the results as objectively as possible. However, I do have some personal observations that are worth presenting for discussion purposes. Hopefully other researchers will review these results and make future suggestions that will aid in my interpretations.
The one-step models are easier to operationalize but conceptually provide lower resolution than the two-step models. They can require using less computational resources and GIS facilities. However, it is necessary to calculate road-based distances and use a GIS to display results on a map. Also standard statistical packages and programming environments can be used to perform most of the calculations. The results seem to be more similar to the traditional county-based service capacity standards used by the federal government for measuring physician shortage areas. The mean values are similar to the statewide means and have smaller variances. Although they appear to spread out the potential accessibility measures more evenly, they can also over-estimate in actual shortage areas. Regardless, the one-step model may be a good initial first step beyond the traditional county-based methods. They are easier to explain to decision makers such as state legislators as they are a simple extension of the standard service capacity ratio measure. A one-step model may be more suitable at the state level of geography than a similarly constructed two-step model. As such, it may be a more appropriate choice for healthcare planners seeking necessary increased funding.
The two-step models require more effort to operationalize but conceptually provides higher resolution than the one-step models. More computational resources and GIS facilities are required. A more involved programming effort within the GIS and related scripting environments is necessary as the calculations are more complicated. The resulting mean values are noticeably larger and variances greater than the traditional county-based service capacity standards used by the federal government for measuring physician shortage areas. They seem to not spread out potential accessibility measures as evenly as the one-step models and may perhaps over-estimate and severely under-estimate in some areas. Regardless, they represent a definite technological advancement that should be applied when appropriate. However, they are more difficult to explain to a non-professional audience. A two-step model may be more suitable at the urban and regional levels of geography than a similarly constructed one-step model. Several varieties of two-step models are continuing to be developed by academic researchers and their parctical applications have been demonstrated.
There is more research work to be undertaken in evaluating the similarities and differences (comparing and contrasting) between the one-step and two-step models. This study used the ZIP-Code (post office locations or centroids) to locate physicians because this older data was all that was available. More recent physician data, if available, could be used to more accurately geocode addresses. From a more technical perspective, the effects of both the modifiable areal unit problem (MAUP) and mathematical aspects of the calculations need more evaluation. It is also apparent that healthcare planners could benefit if better scripting applications were developed. These developments could make it easier to review results and allow for better selection of an appropriate model to employ.
I spent more time than expected this summer viewing selected technical sessions from the recent Esri Developer Summit and becoming familiar with the ArcGIS API for Python and the new Spatially Enabled DataFrame (SEDF). Replacing GeoPandas with the SEDF was easy and the calculations of a test 2SEE model were successful. However, the map visualization of results within Jupyter Notebook was more problematic. Using an ArcGIS Online WebMap with the Jupyter Notebook Mapping Widget (arcgis.widgets.MapView) was the best alternative as it allows for a map legend to also be displayed (see below). But I have had problems after adding a results layer directly to the webmap. I can't display the polygon symnology without saving it to ArcGIS Online for rendering with smart mapping and then loading it again in another jupyter notebook cell to display. I hope to learn more about the proper way to do this and resolve my problems shortly.
The additional statistical analyses of results has clearly shown that significant differences are apparent
in the comparisons of the one-step (1SHGM) and two-step (G2SFCA) models.
A series of scatterplots were prepared and correlation coefficients were calculated that support the results
obtained from the previous histograms and T-Tests. The table (see below) summarizes these results.
The most similar results when comparing the one-step and two-step models was for those that used a power
based distance decay method (1SEP and 2SEP). The one-step (1SEP) model also seemed to produce the most
realistic results with a mean value close to the statewide mean (0.6225) and a more even numerical
distribution of census tracts below (244) and at or above (255) the national proviter-to-population benchmark (0.79).
However, there was not a very even geographical distribution (see map below) as most of the
census tracts below the benchmark seem to be in the larger rural census tracts with some exceptions
in the southern and southeastern urban areas close to New Mexico's borders with Arizona, Texas and Mexico.
Additional research using spatial statistical
regression models is planned that will evaluate how results
from these geographic access models are related to various measures of
health inequality and health disparities.
Note: If you have problems viewing the PDFs below use another PDF reader other than Adobe Reader. See Foxit Reader that should work better. This should allow the text boxes on the gravity model poster to display properly.
DGR's Primary Care Physicians Gravity Model Poster (11 x 17 version)
DGR's Primary
Care Physicians Gravity Model Report, from HPC Quick Facts 2003
DGR's Gravity
Model Power Point Presentation (original HPC presentation)
Map of Population per Primary Care
Physician - ZIP Codes (ArcGIS ModelBuilder - Preliminary Version)
DGR's Gravity
Model (ArcGIS ModelBuilder - Preliminary Version)
DGR's Gravity
Model (ArcGIS Python Script Tool, ModelBuilder - Preliminary Version)
DGR's Gravity
Model (QGIS Python Processing Script - Preliminary Version)
DGR's Gravity
Model (QGIS Python Plugin - Preliminary Version)
DGR's Gravity
Model (ArcGIS Python Script Tool, ArcPy Non-ModelBuilder - Preliminary Version)
Primary Care Physicians, 2015
(Euclidean Distance Gravity Model Map - Counties)
Primary Care Physicians, 2015
(Road Distance Gravity Model Map - Counties)
Primary Care Physicians, 2015
(IDW - Road Distance Gravity Model Map - Counties)
Primary Care Physicians, 2002
(Road Distance Gravity Model Map - Tracts)
Primary Care Physicians, 2002
(IDW - Road Distance Gravity Model Map - Tracts)
DGR's Gravity
Model Web Page (original version - literature review - links need updates)
Geographic Access
to Primary Care Physicians (geography class presentation)
Map of Population per Primary Care
Physician - NMDOH Small Areas (SAS Macro - Preliminary Version)
Map of Population per Primary Care
Physician - Census Tracts (SAS Macro - Preliminary Version)
Geographic Access
to Family Practice Physicians (recent geography class presentation)
Larry Spear, Sr. Research Scientist (Ret.) Division of Government Research University of New Mexico Email: lspear@unm.edu lspearnm@gmail.com WWW: https://www.unm.edu/~lspear LinkedIn https://www.linkedin.com/in/larry-spear-93371970
Last Revised: 4/12/2022 Larry Spear (lspear@unm.edu)