Tutorial: SQL

This tutorial will teach you the basics of SQL for querying relational databases.

Attachments

Tutorial

This tutorial is based on the Library Carpentry SQL tutorial, licensed under CC-BY 4.0.

You can run this Jupyter notebook in Google Colab or on your local installation.

Note: In Colab, if the notebook is View Only, remember to make a copy, so that you can modify it:

  • File –> Save a copy in Drive.

Attribution

Jordan Perdersen (Ed.), Kristin Lee (Ed.), Christopher Erdmann (Ed.), Lise Doucette (Ed.), Elaine Wong (Ed.), Janice Chan (Ed.), James Baker, Fernando Rios, Tim Dennis, Belinda Weaver, … orobecca. (2019, July). LibraryCarpentry/lc-sql: Library Carpentry: Introduction to SQL, June 2019 (Version v2019.06.1). Zenodo. http://doi.org/10.5281/zenodo.3266102

DOI DOI

Modifications

Extracts modified to Jupyter notebook use by Stian Soiland-Reyes https://orcid.org/0000-0001-9842-9718.

References

import sqlite3

Data ingestion

In this tutorial we assume data has already been ingested and cleaned. We’ll retrieve the already prepared SQLite database.

SQLite is a miniature relational database that can be used without a server installation, for instance by directly importing the SQLite library in Python and “connecting” it to a local file. It is worth noting that this means only a single process can access the file at a time (in this case our Jupyter notebook).

Download SQLite database

The Zenodo deposit https://doi.org/10.5281/zenodo.8360812 contains a ZIP file with the pre-populated database in SQLite format. As this tutorial is not showing “proper” data ingestion, we’ll just download and unzip using Linux shell commands:

!wget https://zenodo.org/records/8360812/files/doaj-article-sample-db.zip && unzip doaj-article-sample-db.zip
--2025-07-08 15:12:03--  https://zenodo.org/records/8360812/files/doaj-article-sample-db.zip
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.48.194, 188.185.43.25, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 947007 (925K) [application/octet-stream]
Saving to: ‘doaj-article-sample-db.zip’

doaj-article-sample 100%[===================>] 924.81K   499KB/s    in 1.9s    

2025-07-08 15:12:06 (499 KB/s) - ‘doaj-article-sample-db.zip’ saved [947007/947007]

Archive:  doaj-article-sample-db.zip
  inflating: doaj-article-sample.db  
  inflating: doaj-article-sample.db.sql  

Now let’s connect to this database from within Python using the sqlite3 library, and see which tables are included. In SQLite this can be done by inspecting the sqlite_master table, using the SELECT * query . We’ll use pandas as a helper to better view the SQL results in Jupyter Notebook.

import sqlite3
import pandas as pd

db = sqlite3.connect("doaj-article-sample.db")
pd.read_sql_query("SELECT * FROM sqlite_master", db)
typenametbl_namerootpagesql
0tablesqlite_sequencesqlite_sequence275CREATE TABLE sqlite_sequence(name,seq)
1tablelanguageslanguages408CREATE TABLE "languages" (\n "id" INTEGER NOT...
2tablepublisherspublishers679CREATE TABLE "publishers" (\n "id" INTEGER NO...
3tablelicenceslicences2CREATE TABLE "licences" (\n "id" INTEGER NOT ...
4tablejournalsjournals3CREATE TABLE "journals" (\n "id" INTEGER NOT ...
5tablearticlesarticles5CREATE TABLE "articles" (\n "id" INTEGER NOT ...

Introduction to SQL

In this tutorial we’ll follow the Library Carpentry guide, but run the queries from within Python instead of the DB Browser.

Selecting and querying data

Next, follow the Section 2: Selecting and sorting data. Use Python’s """ strings to support multiple lines.

Note that you don’t need to use print(), as Jupyter Notebook will output the result of the last line, which with Panda becomes a HTML table:

pd.read_sql_query("""
SELECT title
FROM articles
""", db)
Title
0The Fisher Thermodynamics of Quasi-Probabilities
1Aflatoxin Contamination of the Milk Supply: A ...
2Metagenomic Analysis of Upwelling-Affected Bra...
3Synthesis and Reactivity of a Cerium(III) Scor...
4Performance and Uncertainty Evaluation of Snow...
......
996Crystal structure of [3-(1H-benzimidazol-2-yl)...
997Crystal structure of bis(3-bromopyridine-κN)bi...
998Crystal structure of 4,4′-(ethane-1,2-diyl)bis...
999Crystal structure of (Z)-4-[1-(4-acetylanilino...
1000Metagenomic Analysis of Upwelling-Affected Bra...

1001 rows × 1 columns

SQL results in Python

Above, Pandas’ DataFrame is rendered in Juyter Notebook, skipping most of the 1001 rows. Alternatively we can inspect the table returned in Python. This is beneficial in case you need to do further processing in code.

Note: You do not have to use Panda dataframes for accessing SQL databases from Python, see also the SQLite Python tutorial for accessing the database directly.

Below, the method .head() is used to only select the first couple lines of the dataframe, while .itertuples() iterates over each row in the SQL result. See the Getting started with pandas for further guidance on dataframes.

Warning: While SQL is generally case-insensitive, the database’s spelling of the column name will be returned, which in this case is Title (instead of title as in the query) – this matters below as Python code is case-sensitive.

articles = pd.read_sql_query("""
SELECT Title
FROM articles
""", db)
for article in articles.head().itertuples():
  print(article.Title)
The Fisher Thermodynamics of Quasi-Probabilities
Aflatoxin Contamination of the Milk Supply: A Pakistan Perspective
Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS
Synthesis and Reactivity of a Cerium(III) Scorpionate Complex Containing a Redox Non-Innocent 2,2′-Bipyridine Ligand
Performance and Uncertainty Evaluation of Snow Models on Snowmelt Flow Simulations over a Nordic Catchment (Mistassibi, Canada)

Tip: You can modify the code of Jupyter notebook, and click each cell’s ▶️ button to run them. Use the [+ Code] button in Colab to add further code blocks. Use one code block per output.

Errors

Before we continue, let’s look quickly at different errors in the SQL, which will cause a Python exception.

Syntax error

The Python code is valid, but the SQL inside has a typo (SELET instead of SELECT):

pd.read_sql_query("""
SELET title
FROM articles
""", db)
---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in execute(self, sql, params)
   2673         try:
-> 2674             cur.execute(sql, *args)
   2675             return cur


OperationalError: near "SELET": syntax error


The above exception was the direct cause of the following exception:


DatabaseError                             Traceback (most recent call last)

/tmp/ipython-input-22-3013307109.py in <cell line: 0>()
----> 1 pd.read_sql_query("""
      2 SELET title
      3 FROM articles
      4 """, db)


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype, dtype_backend)
    524 
    525     with pandasSQL_builder(con) as pandas_sql:
--> 526         return pandas_sql.read_query(
    527             sql,
    528             index_col=index_col,


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   2736         dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   2737     ) -> DataFrame | Iterator[DataFrame]:
-> 2738         cursor = self.execute(sql, params)
   2739         columns = [col_desc[0] for col_desc in cursor.description]
   2740 


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in execute(self, sql, params)
   2684 
   2685             ex = DatabaseError(f"Execution failed on sql '{sql}': {exc}")
-> 2686             raise ex from exc
   2687 
   2688     @staticmethod


DatabaseError: Execution failed on sql '
SELET title
FROM articles
': near "SELET": syntax error

Semantic error

Here the query is correct syntactically, but it’s referencing a column that does not exist:

pd.read_sql_query("""
SELECT wrongColumn
FROM articles
""", db)
---------------------------------------------------------------------------

OperationalError                          Traceback (most recent call last)

/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in execute(self, sql, params)
   2673         try:
-> 2674             cur.execute(sql, *args)
   2675             return cur


OperationalError: no such column: wrongColumn


The above exception was the direct cause of the following exception:


DatabaseError                             Traceback (most recent call last)

/tmp/ipython-input-23-818033597.py in <cell line: 0>()
----> 1 pd.read_sql_query("""
      2 SELECT wrongColumn
      3 FROM articles
      4 """, db)


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype, dtype_backend)
    524 
    525     with pandasSQL_builder(con) as pandas_sql:
--> 526         return pandas_sql.read_query(
    527             sql,
    528             index_col=index_col,


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
   2736         dtype_backend: DtypeBackend | Literal["numpy"] = "numpy",
   2737     ) -> DataFrame | Iterator[DataFrame]:
-> 2738         cursor = self.execute(sql, params)
   2739         columns = [col_desc[0] for col_desc in cursor.description]
   2740 


/usr/local/lib/python3.11/dist-packages/pandas/io/sql.py in execute(self, sql, params)
   2684 
   2685             ex = DatabaseError(f"Execution failed on sql '{sql}': {exc}")
-> 2686             raise ex from exc
   2687 
   2688     @staticmethod


DatabaseError: Execution failed on sql '
SELECT wrongColumn
FROM articles
': no such column: wrongColumn
  • First, try to correct the errors above, then re-execute the cells.

Next, follow the rest of section 2 onwards to the remaining SQL selection exercises.

Capitalization and good style

Follow exercise on good style below:

pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)

Unique values

Follow exercise on unique values

pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)

Sorting

Follow exercise on sorting

pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)

Filtering

Follow exercies on filtering

pd.read_sql_query("""
SELECT *
FROM articles
WHERE ISSNs = '2056-9890'
""", db).head()
idTitleAuthorsDOIURLSubjectsISSNsCitationLanguageIdLicenceIdAuthor_CountFirst_AuthorCitation_CountDayMonthYear
0142Crystal structure of 7-isopropyl-1,4a,N-trimet...Li Liu|Xin-Yan Yan|Xiao-Ping Rao10.1107/S2056989015017648https://doaj.org/article/69bfe1351c864bc8a6b38...crystal structure|dihydroabietic acid derivati...2056-9890Acta Crystallographica Section E: Crystallogra...113Li Liu101102015
1143Crystal structure of 2-(2,4-diphenyl-3-azabicy...K. Priya|K. Saravanan|S. Kabilan|S. Selvanayagam10.1107/S2056989015017740https://doaj.org/article/ce085e7ac3bd43e49a122...crystal structure|3-azabicyclononane derivativ...2056-9890Acta Crystallographica Section E: Crystallogra...114K. Priya101102015
2144Crystal structure of 2-[2-(hydroxyimino)-1-phe...Brian J. Anderson|Michael B. Freedman|Sean P. ...10.1107/S2056989015017739https://doaj.org/article/4dd1a69aaefa478b8d2bb...crystal structure|thiosemicarbazone|weak inter...2056-9890Acta Crystallographica Section E: Crystallogra...115Brian J. Anderson101102015
3145Crystal structure of 1-methoxy-5-methyl-N-phen...Inna S. Khazhieva|Tatiana V. Glukhareva|Pavel ...10.1107/S2056989015017776https://doaj.org/article/8c3bd9d71ba642f5b28d0...crystal structure|1,2,3-triazole|rearrangement...2056-9890Acta Crystallographica Section E: Crystallogra...114Inna S. Khazhieva101102015
4146Redetermined structure of 4,4′-bipyridine–1,4-...Rima Paul|Sanchay Jyoti Bora10.1107/S2056989015017569https://doaj.org/article/b2d1ca2bb6fb4ba48c11e...crystal structure|co-crystal|supramolecular in...2056-9890Acta Crystallographica Section E: Crystallogra...112Rima Paul101102015

Note that in SQL, value comparisons is done with a single equal sign =, unlike Python and other programing languagues typical use of == or .equals

pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()

Ordering and commenting

Follow exercises on Ordering and commenting

pd.read_sql_query("""
SELECT Title, Authors
FROM articles
WHERE ISSNs = '2067-2764|2247-6202'
ORDER BY First_Author ASC;
""", db)
TitleAuthors
0Zweier I-Convergent Double Sequence Spaces Def...A. Khan Vakeel| Khan Nazneen|Yasmeen
1New Čebyšev Type Inequalities for Functions wh...B. Meftah|K. Boukerrioua
2Initial Maclaurin Coecients Bounds for New Sub...Basem Aref Frasin|Tariq Al-Hawary
3Measure of Tessellation Quality of Voronoï MeshesE. A-iyeh|J.F. Peters
4The Applicability of $-Calculus to Solve Some ...Eugene Eberbach
5Some Families of q-Series Identities and Assoc...H. M. Srivastava|S. N. Singh|S. P. Singh
6Lp - Approximation of Analytic Functions on Co...Kumar Devendra|Jain Vandna
7A Mixed Integer Linear Programming Formulation...Marija Ivanović
8Properties of Stabilizing ComputationsMark Burgin
9Primality Testing and Factorization by using F...Musha Takaaki
10Luhn Prime NumbersOctavian Cira|Florentin Smarandache
11Hadamard Product of Certain Harmonic Univalent...R. M. El-Ashwah|B. A. Frasin
12Second Hankel Determinant for Generalized Saka...S. P. Vijayalakshmi|T. V. Sudharsan
13Some Perturbed Ostrowski Type Inequalities for...S. S. Dragomir
14Katsaras’s Type Fuzzy Norm under Triangular NormsSorin Nădăban|Tudor Bînzar|Flavius Pater|Carme...
15On BVσ I-convergent Sequence Spaces Defined by...Vakeel A. Khan|Mohd Shafiq|Rami Kamel Ahmad Ra...
16Coupled Systems of Fractional Integro-Differen...Zoubir Dahmani|Mohamed Amin Abdellaoui|Mohamed...
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)

Aggregating values

Follow exercise on Aggregating and calculating values

pd.read_sql_query("""
SELECT ISSNs, AVG(Citation_Count)
FROM articles
GROUP BY ISSNs;
""", db)
ISSNsAVG(Citation_Count)
00367-0449|1988-32504.000000
11099-43004.000000
21420-30494.000000
31422-00678.000000
41424-82204.000000
51660-33974.000000
61660-460110.000000
71996-19444.000000
81999-49074.000000
91999-49154.000000
102056-98909.593240
112067-2764|2247-62028.647059
122071-10505.000000
132072-42925.000000
142072-66434.000000
152072-66514.000000
162072-66944.000000
172073-43444.000000
182073-43604.000000
192073-43954.000000
202073-44254.000000
212073-44414.000000
222073-89944.000000
232075-163X4.000000
242075-17024.000000
252075-44184.000000
262075-44424.000000
272075-47014.000000
282075-53094.000000
292076-07874.000000
302076-26154.000000
312076-34175.000000
322076-39054.000000
332076-393X4.000000
342077-03754.000000
352077-04725.000000
362077-14444.000000
372078-15474.000000
382079-63744.000000
392079-77374.000000
402079-89544.000000
412079-92688.000000
422220-99648.000000
432227-97174.000000
442250-11777.000000
452278-4748|2278-48028.000000
462304-67405.000000
472304-67675.000000
482305-63043.000000
492306-53384.000000
502306-53545.000000
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)
pd.read_sql_query("""
SELECT ...
""", db)

Joins and aliases

Follow exercises on Joins and aliases

pd.read_sql_query("""
SELECT *
FROM articles
JOIN journals
ON articles.ISSNs = journals.ISSNs;
""", db).head()
idTitleAuthorsDOIURLSubjectsISSNsCitationLanguageIdLicenceId...First_AuthorCitation_CountDayMonthYearidISSN-LISSNsPublisherIdJournal_Title
00The Fisher Thermodynamics of Quasi-ProbabilitiesFlavia Pennini|Angelo Plastino10.3390/e17127853https://doaj.org/article/b75e8d5cca3f46cbbd63e...Fisher information|quasi-probabilities|complem...1099-4300Entropy, Vol 17, Iss 12, Pp 7848-7858 (2015)11...Flavia Pennini41112015141099-43001099-43002Entropy
11Aflatoxin Contamination of the Milk Supply: A ...Naveed Aslam|Peter C. Wynn10.3390/agriculture5041172https://doaj.org/article/0edc5af6672641c0bd456...aflatoxins|AFM1|AFB1|milk marketing chains|hep...2077-0472Agriculture (Basel), Vol 5, Iss 4, Pp 1172-118...11...Naveed Aslam5111201512077-04722077-04722Agriculture
22Metagenomic Analysis of Upwelling-Affected Bra...Rafael R. C. Cuadrat|Juliano C. Cury|Alberto M...10.3390/ijms161226101https://doaj.org/article/d9fe469f75a0442382b84...PKS|NRPS|metagenomics|environmental genomics|u...1422-0067International Journal of Molecular Sciences, V...11...Rafael R. C. Cuadrat81112015221422-00671422-00672International Journal of Molecular Sciences
33Synthesis and Reactivity of a Cerium(III) Scor...Fabrizio Ortu|Hao Zhu|Marie-Emmanuelle Boulon|...10.3390/inorganics3040534https://doaj.org/article/95606ed39deb4f43b96f7...lanthanide|cerium|scorpionate|tris(pyrazolyl)b...2304-6740Inorganics (Basel), Vol 3, Iss 4, Pp 534-553 (...11...Fabrizio Ortu51112015202304-67402304-67402Inorganics
44Performance and Uncertainty Evaluation of Snow...Magali Troin|Richard Arsenault|François Brissette10.3390/hydrology2040289https://doaj.org/article/18b1d70730d44573ab5c2...snow models|hydrological models|snowmelt|uncer...2306-5338Hydrology, Vol 2, Iss 4, Pp 289-317 (2015)11...Magali Troin41112015192306-53382306-53382Hydrology

5 rows × 21 columns

pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()
pd.read_sql_query("""
SELECT ...
""", db).head()

Aliases

Follow exercises on aliases

pd.read_sql_query("""
SELECT ...
""", db).head()

Saving queries

Follow exercise of Saving queries

Note: As DROP VIEW and CREATE VIEW do not return any result, below we have to use db.execute directly rather than pd.read_sql_query.

db.execute("DROP VIEW IF EXISTS journal_counts")

db.execute("""
CREATE VIEW ...

""")
pd.read_sql_query("""
SELECT ...
""", db).head()

Remember that a view is not stored as a separate table, but is a pre-canned query that can then itself be queries. That means updates to the tables will modify the views! SQL Views typically cannot themselves be modified with INSERT.

Nevertheless, if a view was temporary, we may want to delete it so it does clutter our database:

db.execute("DROP VIEW journal_counts")

Additional reading

The remaining sections of Library Carpentry SQL relates to creating databases before they can be queried.

Tip: Sometimes knowing what kind of queries are needed will influence the design at earlier data engineering phases, e.g. reducing the need for complex JOINs.