2 Introduction to SQL

3 Getting Started

This week we’ll be learning how to use SQL to query data from a database:

Introduction to SQL

Clicking the link will take you to Google Colab, where you can run the notebook in your browser. You can also download the notebook and run it locally if you prefer.

Once you’ve completed this, you can test your knowledge by answering the questions in the lab below. You can also access it on Google Colab here.

4 Lab

Datasets:

The following datasets are used in this lab. You don’t need to download them manually, they can be accessed directly from the notebook.

# %pip install duckdb duckdb-engine jupysql

import duckdb

%load_ext sql

%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

4.1 Question 1: Creating Tables

Create a database, then write a SQL query to create a table named nyc_subway_stations and load the data from the file nyc_subway_stations.tsv into it. Similarly, create a table named nyc_neighborhoods and load the data from the file nyc_neighborhoods.tsv into it.

4.2 Question 2: Column Filtering

Write a SQL query to display the ID, NAME, and BOROUGH of each subway station in the nyc_subway_stations dataset.

4.3 Question 3: Row Filtering

Write a SQL query to find all subway stations in the nyc_subway_stations dataset that are located in the borough of Manhattan.

4.4 Question 4: Sorting Results

Write a SQL query to list the subway stations in the nyc_subway_stations dataset in alphabetical order by their names.

4.5 Question 5: Unique Values

Write a SQL query to find the distinct boroughs represented in the nyc_subway_stations dataset.

4.6 Question 6: Counting Rows

Write a SQL query to count the number of subway stations in each borough in the nyc_subway_stations dataset.

4.7 Question 7: Aggregating Data

Write a SQL query to list the number of subway stations in each borough, sorted in descending order by the count.

4.8 Question 8: Joining Tables

Write a SQL query to join the nyc_subway_stations and nyc_neighborhoods datasets on the borough name, displaying the subway station name and the neighborhood name.

4.9 Question 9: String Manipulation

Write a SQL query to display the names of subway stations in the nyc_subway_stations dataset that contain the word “St” in their names.

4.10 Question 10: Filtering with Multiple Conditions

Write a SQL query to find all subway stations in the nyc_subway_stations dataset that are in the borough of Brooklyn and have routes that include the letter “R”.

5 References

Qiusheng Wu