1 Spatial databases
A database management system (DBMS) allows users to store, insert, delete, and update information in a database. Spatial databases go a step further because they record data with geographic coordinates.
From Esri Geodatabase to PostGIS, spatial databases have quickly become the primary method of managing spatial data.
To learn more about spatial databases, check out the resources below:
- Wikipedia: Spatial database
- 7 Spatial Databases for Your Enterprise
- GISGeography: Spatial Databases – Build Your Spatial Data Empire
- Esri: What is a geodatabase?
- Introduction to PostGIS
- PostGEESE? Introducing The DuckDB Spatial Extension
1.1 DuckDB
DuckDB is an in-process SQL OLAP database management system. It is designed to be used as an embedded database in applications, but it can also be used as a standalone SQL database.
- In-process SQL means that DuckDB’s features run in your application, not an external process to which your application connects. In other words: there is no client sending instructions nor a server to read and process them. SQLite works the same way, while PostgreSQL, MySQL…, do not.
- OLAP stands for OnLine Analytical Processing, and Microsoft defines it as a technology that organizes large business databases and supports complex analysis. It can be used to perform complex analytical queries without negatively affecting transactional systems.
DuckDB is a great option if you’re looking for a serverless data analytics database management system.


2 Introduction to SQL
This week, we will be familiarizing ourselves with DuckDB, learning how to use SQL within a Jupyter notebook, and loading in spatial data. Please work through the following notebook to get started:
3 OPTIONAL - Downloading PostgreSQL and PostGIS locally for Windows and MacOS
This module will primarily use DuckDB to explore spatial data. However, if you would like to learn more about PostgreSQL and PostGIS, you can follow the instructions below to install them on your local machine.
This workshop uses a data bundle. Download it and extract to a convenient location.
To explore the PostgreSQL/PostGIS database, and learn about writing spatial queries in SQL, we will need some software, either installed locally or available remotely on the cloud.
- There are instructions below on how to access PostgreSQL for installation on Windows or MacOS. PostgreSQL for Windows and MacOS either include PostGIS or have an easy way to add it on.
- There are instructions below on how to install PgAdmin. PgAdmin is a graphical database explorer and SQL editor which provides a "user facing" interface to the database engine that does all the world.
For always up-to-date directions on installing PostgreSQL, go to the PostgreSQL download page and select the operating system you are using.
3.1 PostgreSQL for Microsoft Windows
For a Windows install:
- Go to the Windows PostgreSQL download page. 
- Select the latest version of PostgreSQL and save the installer to disk. 
- Run the installer and accept the defaults. 
- Find and run the "StackBuilder" program that was installed with the database. 
- Select the "Spatial Extensions" section and choose latest "PostGIS ..Bundle" option.  - image 
- Accept the defaults and install. 
3.2 PostgreSQL for Apple MacOS
For a MacOS install:
- Go to the Postgres.app site, and download the latest release. 
- Open the disk image, and drag the Postgres icon into the Applications folder.  - image 
- In the Applications folder, double-click the Postgres icon to start the server. 
- Click the Initialize button to create a new blank database instance.  {.inline, .border .inline, .border} {.inline, .border .inline, .border}
- In the Applications folder, go to the Utilities folder and open Terminal. 
- Add the command-line utilities to your PATH for convenience. 
sudo mkdir -p /etc/paths.d echo /Applications/Postgres.app/Contents/Versions/latest/bin | sudo tee /etc/paths.d/postgresapp
3.3 PgAdmin for Windows and MacOS
PgAdmin is available for multiple platforms, at https://www.pgadmin.org/download/.
- Download and install the latest version for your platform. 
- Start PgAdmin!  - image 
4 Creating a Spatial Database
4.1 PgAdmin
PostgreSQL has a number of administrative front-ends. The primary one is psql, a command-line tool for entering SQL queries. Another popular PostgreSQL front-end is the free and open source graphical tool pgAdmin. All queries done in pgAdmin can also be done on the command line with psql. pgAdmin also includes a geometry viewer you can use to spatial view PostGIS queries.
- Find pgAdmin and start it up.  - image 
- If this is the first time you have run pgAdmin, you probably don't have any servers configured. Right click the - Serversitem in the Browser panel.- We'll name our server PostGIS. In the Connection tab, enter the - Host name/address. If you're working with a local PostgreSQL install, you'll be able to use- localhost. If you're using a cloud service, you should be able to retrieve the host name from your account.- Leave Port set at - 5432, and both Maintenance database and Username as- postgres. The Password should be what you specified with a local install or with your cloud service. - image 
4.2 Creating a Database
- Open the Databases tree item and have a look at the available databases. The - postgresdatabase is the user database for the default postgres user and is not too interesting to us.
- Right-click on the - Databasesitem and select- New Database. - image 
- Fill in the - Create Databaseform as shown below and click OK.- Name - nyc- Owner - postgres - image 
- Select the new - nycdatabase and open it up to display the tree of objects. You'll see the- publicschema. - image 
- Click on the SQL query button indicated below (or go to Tools > Query Tool).  - image 
- Enter the following query into the query text field to load the PostGIS spatial extension: - CREATE EXTENSION postgis;
- Click the Play button in the toolbar (or press F5) to "Execute the query." 
- Now confirm that PostGIS is installed by running a PostGIS function: - SELECT postgis_full_version();
You have successfully created a PostGIS spatial database!!
4.3 Function List
PostGIS_Full_Version: Reports full PostGIS version and build configuration info.
5 Loading spatial data
Supported by a wide variety of libraries and applications, PostGIS provides many options for loading data.
We will first load our working data from a database backup file, then review some standard ways of loading different GIS data formats using common tools.
5.1 Loading the Backup File
- In the PgAdmin browser, right-click on the nyc database icon, and then select the Restore... option.  {.inline, .border .inline, .border} {.inline, .border .inline, .border}
- Browse to the location of your workshop data data directory (available in the workshop data bundle), and select the - nyc_data.backupfile. {.inline, .border .inline, .border} {.inline, .border .inline, .border}
- Click on the Restore options tab, scroll down to the Do not save section and toggle Owner to Yes.  {.inline, .border .inline, .border} {.inline, .border .inline, .border}
- Click the Restore button. The database restore should run to completion without errors.  {.inline, .border .inline, .border} {.inline, .border .inline, .border}
- After the load is complete, right click the nyc database, and select the Refresh option to update the client information about what tables exist in the database.  {.inline, .border .inline, .border} {.inline, .border .inline, .border}
Note
If you want to practice loading data from the native spatial formats, instead of using the PostgreSQL db backup files just covered, the next couple of sections will guide you thru loading using various command-line tools and QGIS DbManager. Note you can skip these sections, if you have already loaded the data with pgAdmin.
5.2 Shapefiles? What's that?
You may be asking yourself -- "What's this shapefile thing?" A "shapefile" commonly refers to a collection of files with .shp, .shx, .dbf, and other extensions on a common prefix name (e.g., nyc_census_blocks). The actual shapefile relates specifically to files with the .shp extension. However, the .shp file alone is incomplete for distribution without the required supporting files.
Mandatory files:
- .shp—shape format; the feature geometry itself
- .shx—shape index format; a positional index of the feature geometry
- .dbf—attribute format; columnar attributes for each shape, in dBase III
Optional files include:
- .prj—projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
The shp2pgsql utility makes shape data usable in PostGIS by converting it from binary data into a series of SQL commands that are then run in the database to load the data.
5.3 Loading with shp2pgsql
The shp2pgsql converts Shape files into SQL. It is a conversion utility that is part of the PostGIS code base and ships with PostGIS packages. If you installed PostgreSQL locally on your computer, you may find that shp2pgsql has been installed along with it, and it is available in the executable directory of your installation.
Unlike ogr2ogr, shp2pgsql does not connect directly to the destination database, it just emits the SQL equivalent to the input shape file. It is up to the user to pass the SQL to the database, either with a "pipe" or by saving the SQL to file and then loading it.
Here is an example invocation, loading the same data as before:
export PGPASSWORD=mydatabasepassword
shp2pgsql \
  -D \
  -I \
  -s 26918 \
  nyc_census_blocks_2000.shp \
  nyc_census_blocks_2000 \
  | psql dbname=nyc user=postgres host=localhostHere is a line-by-line explanation of the command.
shp2pgsql \The executable program! It reads the source data file, and emits SQL which can be directed to a file or piped to psql to load directly into the database.
-D \The D flag tells the program to generate "dump format" which is much faster to load than the default "insert format".
-I \The I flag tells the program to create a spatial index on the table after loading is complete.
-s 26918 \The s flag tells the program what the "spatial reference identifier (SRID)" of the data is. The source data for this workshop is all in "UTM 18", for which the SRID is 26918 (see below).
nyc_census_blocks_2000.shp \The source shape file to read.
nyc_census_blocks_2000 \The table name to use when creating the destination table.
| psql dbname=nyc user=postgres host=localhostThe utility program is generating a stream of SQL. The "|" operator takes that stream and uses it as input to the psql database terminal program. The arguments to psql are just the connection string for the destination database.
5.4 SRID 26918? What's with that?
Most of the import process is self-explanatory, but even experienced GIS professionals can trip over an SRID.
An "SRID" stands for "Spatial Reference IDentifier." It defines all the parameters of our data's geographic coordinate system and projection. An SRID is convenient because it packs all the information about a map projection (which can be quite complex) into a single number.
You can see the definition of our workshop map projection by looking it up either in an online database,
or directly inside PostGIS with a query to the spatial_ref_sys table.
SELECT srtext FROM spatial_ref_sys WHERE srid = 26918;Note
The PostGIS spatial_ref_sys table is an OGC-standard table that defines all the spatial reference systems known to the database. The data shipped with PostGIS, lists over 3000 known spatial reference systems and details needed to transform/re-project between them.
In both cases, you see a textual representation of the 26918 spatial reference system (pretty-printed here for clarity):
PROJCS["NAD83 / UTM zone 18N",
  GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
      SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],
      AUTHORITY["EPSG","6269"]],
    PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4269"]],
  UNIT["metre",1,AUTHORITY["EPSG","9001"]],
  PROJECTION["Transverse_Mercator"],
  PARAMETER["latitude_of_origin",0],
  PARAMETER["central_meridian",-75],
  PARAMETER["scale_factor",0.9996],
  PARAMETER["false_easting",500000],
  PARAMETER["false_northing",0],
  AUTHORITY["EPSG","26918"],
  AXIS["Easting",EAST],
  AXIS["Northing",NORTH]]If you open up the nyc_neighborhoods.prj file from the data directory, you'll see the same projection definition.
Data you receive from local agencies—such as New York City—will usually be in a local projection noted by "state plane" or "UTM". Our projection is "Universal Transverse Mercator (UTM) Zone 18 North" or EPSG:26918.
5.5 Things to Try: View data using QGIS
QGIS, is a desktop GIS viewer/editor for quickly looking at data. You can view a number of data formats including flat shapefiles and a PostGIS database. Its graphical interface allows for easy exploration of your data, as well as simple testing and fast styling.
Try using this software to connect your PostGIS database. The application can be downloaded from https://qgis.org
You'll first want to create a connection to a PostGIS database using menu Layer->Add Layer->PostGIS Layers->New and then filling in the prompts. Once you have a connection, you can add Layers by clicking connect and selecting a table to display.
5.6 Loading data using QGIS DbManager
QGIS comes with a tool called DbManager that allows you to connect to various different kinds of databases, including a PostGIS enabled one. After you have a PostGIS Database connection configured, go to Database->DbManager and expand to your database as shown below:
 {.inline, .border .inline, .border}
{.inline, .border .inline, .border}
From there you can use the Import Layer/File menu option to load numerous different spatial formats. In addition to being able to load data from many spatial formats and export data to many formats, you can also add ad-hoc queries to the canvas or define views in your database, using the highlighted wrench icon.
This section is based on the PostGIS Intro Workshop, sections 3, 4, 5,and 7
