Blog

Here is a graphic representation of SQL Join (focusing on INNER, LEFT, and (FULL OUTER types)). The inner join can be understood by the top left visual where 2 venn diagrams overlap (this is where the 2 tables are similar). Left join is keeping the left circle and where the 2 overlap (left table and where they are similar). Then the full (or outer) join is including everything in the diagram while also inserting “NA” or “None” where data is missing because of the join.
SQL Joins Demystified: INNER, LEFT, and OUTER in Action
Problem Statement
In data science, combining data from multiple tables is common, but choosing the wrong SQL join can lead to incomplete or misleading results. This tutorial helps students understand how INNER, LEFT, and FULL OUTER joins affect outcomes when merging tables (e.g., customers and orders), and guides on selecting the right join for analysis tasks.
Key Steps/Sections
0. SQL Reminders
SELECT: Specifies the columns you want to retrieve from your tables. For example, to get an animal’s name and species, you’d writeSELECT name, species. In joins, you’ll often select columns from multiple tables, so clarity is key.FROM: Indicates the table(s) you’re querying. In a join, you’ll list multiple tables here, likeFROM zoo_animals, exhibits, but we useJOINto link them properly.ON: Defines the condition for matching rows between tables in a join. For example,ON zoo_animals.animal_id = exhibits.animal_idlinks the tables using theanimal_idcolumn. This is the glue that makes joins work.tableinitial.columnname: When working with multiple tables, column names might overlap (e.g., both tables could have anidcolumn). To avoid ambiguity, use the table’s alias or name followed by a dot, likezoo_animals.animal_idorexhibits.exhibit_name. Aliases (e.g.,zoo_animals AS z) make this shorter:z.animal_id. This is critical for clear, error-free queries.;(semicolon): Always end your SQL statements with a semicolon. It tells the database your query is complete, like a period in a sentence. Forgetting it can cause errors in some SQL systems (e.g., PostgreSQL), though SQLite is more forgiving.
1. Setup: Two Simple Tables
Let’s start with two basic tables to illustrate joins. Imagine we have a customers table (with customer details) and an orders table (with purchase records). We’ll use a primary key (customer_id) in customers that acts as a foreign key in orders. For this static tutorial, we’ll define the tables directly as Markdown tables below. In a real SQL environment (like SQLite, MySQL, or PostgreSQL), you would create these tables using CREATE TABLE and INSERT statements.
Customers Table:
| customer_id | name | |
|---|---|---|
| 1 | Alice | alice@email.com |
| 2 | Bob | bob@email.com |
| 3 | Charlie | charlie@email.com |
| 4 | Dana | dana@email.com |
| 5 | Eve | eve@email.com |
Orders Table:
| order_id | customer_id | product | amount |
|---|---|---|---|
| 101 | 1 | Laptop | 1200.0 |
| 102 | 2 | Phone | 800.0 |
| 103 | 1 | Tablet | 300.0 |
| 104 | 6 | Monitor | 200.0 |
| 105 | 3 | Keyboard | 50.0 |
2. INNER JOIN: Matching Only
An INNER JOIN returns only rows where there’s a match in both tables based on the join condition (e.g., customer_id).
SELECT c.customer_id, c.name, o.order_id, o.product, o.amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;Expected Result:
| customer_id | name | order_id | product | amount |
|---|---|---|---|---|
| 1 | Alice | 101 | Laptop | 1200.0 |
| 1 | Alice | 103 | Tablet | 300.0 |
| 2 | Bob | 102 | Phone | 800.0 |
| 3 | Charlie | 105 | Keyboard | 50.0 |
Notice: Only matched customers (1,2,3) appear; Dana (4), Eve (5), and the unmatched order (104 for customer 6) are excluded.
3. LEFT JOIN: Keep All from Left
A LEFT JOIN keeps all rows from the left table (customers), with matching rows from the right (orders). Unmatched right-side columns get NULL.
SELECT c.customer_id, c.name, o.order_id, o.product, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;| customer_id | name | order_id | product | amount |
|---|---|---|---|---|
| 1 | Alice | 101 | Laptop | 1200.0 |
| 1 | Alice | 103 | Tablet | 300.0 |
| 2 | Bob | 102 | Phone | 800.0 |
| 3 | Charlie | 105 | Keyboard | 50.0 |
| 4 | Dana | None | None | None |
| 5 | Eve | None | None | None |
Here, all customers are included, with NULLS for Dana and Eve’s orders. Unmatched orders (like 104) are dropped.
4. FULL OUTER JOIN: Everything, Everywhere
A FULL OUTER JOIN (or just OUTER JOIN in some dialects) includes all rows from both tables, with NULLs where there’s no match. (Note: In SQLite, this is FULL OUTER JOIN; in MySQL, it’s emulated with UNION of LEFT and RIGHT joins.)
SELECT c.customer_id, c.name, o.order_id, o.product, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;| customer_id | name | order_id | product | amount |
|---|---|---|---|---|
| 1 | Alice | 101 | Laptop | 1200.0 |
| 1 | Alice | 103 | Tablet | 300.0 |
| 2 | Bob | 102 | Phone | 800.0 |
| 3 | Charlie | 105 | Keyboard | 50.0 |
| 4 | Dana | None | None | None |
| 5 | Eve | None | None | None |
| 6 | None | 104 | Monitor | 200.0 |
This shows everything: matched rows, plus unmatched customers (4,5) and the unmatched order (104).
5. Comparing Join Results
This table depicts a summary about SQL Joins based off of the examples we used.
| Join Type | Rows Returned | Includes Unmatched Left? | Includes Unmatched Right? | Use Case Example |
|---|---|---|---|---|
| INNER JOIN | 4 | No | No | Get only customers with orders |
| LEFT JOIN | 6 | Yes (with NULLs) | No | Analyze all customers, including those without orders |
| FULL OUTER JOIN | 7 | Yes (with NULLs) | Yes (with NULLs) | Audit all data, spotting orphans in either table |
Call to Action
Zoo Animals Table
This table lists animals in a zoo, with animal_id as the primary key.
| animal_id | name | species |
|---|---|---|
| 1 | Leo | Lion |
| 2 | Ellie | Elephant |
| 3 | Gina | Giraffe |
| 4 | Penny | Penguin |
| 5 | Toby | Tiger |
Exhibits Table
This table lists exhibit assignments, with animal_id as a foreign key referencing the zoo_animals table.
| exhibit_id | animal_id | exhibit_name | area_sqft |
|---|---|---|---|
| 101 | 1 | Savanna | 5000 |
| 102 | 3 | Tall Plains | 3000 |
| 103 | 1 | Big Cats | 2000 |
| 104 | 6 | Jungle | 4000 |
| 105 | 2 | Elephant Enclosure | 6000 |
Notes for Practice: - animal_id 6 in the exhibits table has no match in zoo_animals (unmatched right). - Animals like Penny (4) and Toby (5) in zoo_animals have no exhibits (unmatched left). - Use these tables to practice joins in an SQL editor like DB-Fiddle by creating them with CREATE TABLE and INSERT statements, similar to the SQL snippets in the previous tutorial.