FULL OUTER JOIN with UNION and CROSS JOIN part 2

Tags: queries, JOIN, SQL
Publish Date: 2016-05-28

FULL OUTER JOIN, UNION and CROSS JOIN part 2

In the previous post, we have seen how to perform a FULL OUTER JOIN and what would be a practical scenario to perform such a query.

Here in part 2, based on the coaches and teams tables we used before and our role as bookmakers (who allows people to place bets on which coach will be appointed with whichever team), you will see how to perform a CROSS JOIN and when it could be useful.

id name
teams
1 Dallas Mavericks
2 L.A. Clippers
3 Houston Rockets
4 Memphis Grizzlies
5 San Antonio Spurs
coaches
id name team_id
1 Rick Carlisle 1
2 Doc Rivers 2
3 Kevin McHale NULL
4 Dave Joerger NULL
5 Gregg Popovich 5
6 George Karl NULL

 

Cross Join

A CROSS JOIN produces a cartesian product between two tables. As an example, based on the coaches and teams tables we have above, it will produce all possible coaches-teams combinations:

SELECT c.name as coach, t.name as team
FROM coaches c
CROSS JOIN teams t;

The query would returns 30 rows, based on 6 coaches * 5 teams. However, this result-set won't have much practical use in our case.

 

Practical use case for a CROSS JOIN

What would be very useful, is to find all possible combinations of coaches WITHOUT a team and teams WITHOUT a coach. Once we have this, as bookmakers, we could set the odds and let people place bets on it.

In the example below, the sub-select returns the teams without a coach and the outer-select returns coaches without a team.

SELECT c.name as coach, t.name as team
FROM coaches c
CROSS JOIN(
	SELECT t.name
	FROM coaches c
	RIGHT JOIN teams t ON t.id = c.team_id
	WHERE c.id IS NULL
) t 
WHERE c.team_id IS NULL;

The query above would produce:

coach team
Kevin McHale Houston Rockets
Kevin McHale Memphis Grizzlies
Dave Joerger Houston Rockets
Dave Joerger Memphis Grizzlies
George Karl Houston Rockets
George Karl Memphis Grizzlies

 

So this returns all combinations of coaches without a job and teams looking for a coach.

We can take it a step further, first by creating a table that holds the odds for a coach being appointed with a team, then insert all possible combinations into this table that holds the odds.

CREATE TABLE coach_appointment_odds(
	id INT NOT NULL AUTO_INCREMENT,
    coach VARCHAR(256),
    team  VARCHAR(256),
    probability DECIMAL(7,2),
    odds VARCHAR(256),
    PRIMARY KEY (ID)
);

Now perform an INSERT based on the combinations we want from our CROSS JOIN:

INSERT INTO coach_appointment_odds(coach, team)
	SELECT c.name, t.name
	FROM coaches c
	CROSS JOIN(
		SELECT t.name 
		FROM coaches c
		RIGHT JOIN teams t on c.team_id = t.id
		WHERE c.id IS NULL
	) t
	WHERE c.team_id IS NULL;

This would then fill our table with:

id coach team probability odds
1 Kevin McHale Houston Rockets NULL NULL
2 Kevin McHale Memphis Grizzlies NULL NULL
3 Dave Joerger Houston Rockets NULL NULL
4 Dave Joerger Memphis Grizzlies NULL NULL
5 George Karl Houston Rockets NULL NULL
6 George Karl Memphis Grizzlies NULL NULL

 

Then you can set the odds by updating the records. For example:

UPDATE coach_appointment_odds
SET probability=0.25, odds="4:1"
WHERE id=2;

 

FULL OUTER JOIN with UNION and CROSS JOIN part 1

Tags: queries, JOIN, SQL
Publish Date: 2016-05-28

FULL OUTER JOIN, UNION and CROSS JOIN part 1

While most of us developers are used to the INNER JOIN and LEFT / RIGHT JOIN queries, FULL OUTER JOIN and CROSS JOIN queries aren't something we come accross regularly. But what are exactly FULL OUTER JOIN and CROSS JOIN? And what is a useful scenario for them?

Before we look into a FULL OUTER JOIN, one must understand the concept behind LEFT JOIN and RIGHT JOIN. If you do already, you can skip though to FULL OUTER JOIN. Jump to the next article if you are only interested in a CROSS JOIN.

 

LEFT JOIN and RIGHT JOIN

Here is a brief recap on LEFT JOIN and RIGHT JOIN

When we perform a LEFT JOIN, the database returns all records from the "LEFT TABLE" (that is the table after the FROM keyword), regardless of whether it manages to find a corresponding record in the "RIGHT TABLE" (that is the table after the JOIN keyword), based on the given join condition. A RIGHT JOIN is the vice versa.

As an example, let's assume we have the following tables:

id name
authors
1 Malcolm Gladwell
2 Alan Watts
3 Aspiring Writer ;)
id title author_id
books
1 Outliers 1
2 David and Goliath 1
3 The Way of Zen 2
4 I Ching NULL

When we perform a LEFT JOIN like this:

SELECT b.title as title, a.name author
FROM books b
LEFT JOIN authors a ON b.author_id = a.id;

It returns all books (left table), including the ones we could not find its author (right table).

title author
Outliers Malcolm Gladwell
David and Goliath Malcolm Gladwell
The Way of Zen Alan Watts
I Ching NULL

 

When we perform a RIGHT JOIN like this:

SELECT b.title as title, a.name author
FROM books b
RIGHT JOIN authors a ON b.author_id = a.id;

It returns all authors (right table), including the ones that we could not find a book for (left table).

title author
Outliers Malclom Gladwell
David and Goliath Malcolm Gladwell
The Way of Zen Alan Watts
NULL Aspiring Writer ;)

 

 

FULL OUTER JOIN

A FULL OUTER JOIN is basically an aggregation of a LEFT JOIN with a RIGHT JOIN, without any duplicates.

In the previous examples, "Outliers" by Malcolm Gladwell appears in the results of both LEFT and RIGHT JOINs. In a FULL OUTER JOIN, this record will appear only once instead of twice.

In MySQL, the FULL OUTER JOIN syntax is not supported, so to achieve a FULL OUTER JOIN, we must use the keyword UNION.

UNION and UNION ALL

The UNION keyword performs an aggregation of two select queries and remove any duplicate records, while UNION ALL will keep these duplicates.

In order to perform an UNION (or UNION ALL), the number of columns to be returned from both queries must match. Besides of that, you better select the same columns in the same order in both queries, otherwise your result set wouldn't make sense.

Unfortunately, MySQL lets you get away with specifying different columns or same columns in a different order! But for the sake of good practice, please make sure the columns selected in both queries are exactly the same and in the same order.

Basic Syntax

The basic syntax to perform a FULL OUTER JOIN in MySQL:

SELECT a.column_1, b.column_1 [... more columns]
FROM table_a a
LEFT JOIN table_b b ON b.foreign_key = a.primary_key
UNION
SELECT a.column_1, b.column_1 [... more columns]
FROM table_a a
RIGHT JOIN table_b b ON b.foreign_key = a.primary_key;

 

Practical use case for a FULL OUTER JOIN

Most of the time when we perform a join query (inner, left or right), it happens in a scenario where one of the table has a reference (foreign key) to the other and we want to grab the corresponding information from the other table. So for the books / authors example above, we could get data from the authors table based on the books we have.

However, a FULL OUTER JOIN excels in the opposite. It is great for finding records from both sides, for which you cannot find a corresponding record in the other table. As an example, below are tables of qualified head coaches and basketball teams:

id name
teams
1 Dallas Mavericks
2 L.A. Clippers
3 Houston Rockets
4 Memphis Grizzlies
5 San Antonio Spurs
coaches
id name team_id
1 Rick Carlisle 1
2 Doc Rivers 2
3 Kevin McHale NULL
4 Dave Joerger NULL
5 Gregg Popovich 5
6 George Karl NULL

 

A bookmaker would be interested in a list of coaches without a team and a list of teams without coach, so that he can let people place bets on which coach will be appointed with whichever team.

If we perform a plain vanilla FULL OUTER JOIN:

SELECT c.name as coach, t.name as team
FROM coaches c
LEFT JOIN teams t ON c.team_id = t.id
UNION
SELECT c.name as coach, t.name as team
FROM coaches c
RIGHT JOIN teams t ON c.team_id = t.id;

It would return the following:

coach team
Rick Carlisle Dallas Mavericks
Doc Rivers L.A. Clippers
Kevin McHale NULL
Dave Joerger NULL
Gregg Popovich San Antonio Spurs
George Karl NULL
NULL Houston Rockets
NULL Memphis Grizzlies

 

The results from a plain FULL OUTER JOIN are not exactly very useful in our case, as we are only interested in coaches without a team or teams without a coach.
So to do that, we could either add WHERE-clauses in each select statement:

SELECT c.name as coach, t.name as team
FROM coaches c
LEFT JOIN teams t ON c.team_id = t.id
WHERE t.id IS NULL
UNION
SELECT c.name as coach, t.name as team
FROM coaches c
RIGHT JOIN teams t ON c.team_id = t.id
WHERE c.id IS NULL;

Or use a subquery:

SELECT * 
FROM (
	SELECT c.name as coach, t.name as team
	FROM coaches c
	LEFT JOIN teams t ON c.team_id = t.id
	UNION
	SELECT c.name as coach, t.name as team
	FROM coaches c
	RIGHT JOIN teams t ON c.team_id = t.id
) results 
WHERE coach IS NULL or team IS NULL;

Either of these FULL OUTER JOINs would give us the results we are after: coaches WITHOUT a team and teams WITHOUT a coach.

coach team
Kevin McHale NULL
Dave Joerger NULL
George Karl NULL
NULL Houston Rockets
NULL Memphis Grizzlies

 

In the next post, you will see a practical example of how a CROSS JOIN can be applied on the coaches / teams data that we have.

systemd vs sysvinit part 2

Tags: systemctl, linux, systemd
Publish Date: 2016-05-15

For a quick and dirty cheatsheet, see my other post

Differences between systemd and sysvinit (a.k.a init) - Part 2

In part 1 of this article, I have provided an overview of how sysvinit boots up and manages services. Here in part 2, I will show how this has changed with systemd.

 

Systemd - Boot Process

While in sysvinit there is a concept of runlevels, systemd is based on targets.

The idea behind a target is that a collection of units have to be activated when the system reaches a certain target. You can see the activation of units like dependencies for targets.

A target could be something like the old runlevels (e.g. multi-user.target), but it could also be something like hardware or network detection (e.g. network-online.target). A unit could be something like a service, mount, socket etc., but a target itself is also a unit.

So you can interpret this as: "Make sure that certain services, mounts, sockets etc are activated when my system reaches a certain state."

For a list of units in systemd see systemctl -t help

[root@10 system]# systemctl -t help
Available unit types:
service
socket
target
device
mount
automount
snapshot
timer
swap
path
slice
scope

During the boot process, the first process (pid 1) is (/lib/systemd/)systemd. In fact, you will see that the old /sbin/init is a symlink to /lib/systemd/systemd

[root@10 system]# ls -l /sbin/ | grep systemd
lrwxrwxrwx. 1 root root       22 Jul 14  2015 init -> ../lib/systemd/systemd

Since systemd is no longer based on runlevels, it has gotten rid of the /etc/inittab file, where the default runlevel used to be defined. Instead, systemd finds its default target through /etc/systemd/system/default.target.

[root@10 system]# ls -l /etc/systemd/system
total 4
drwxr-xr-x. 2 root root   54 Jul 14  2015 basic.target.wants
lrwxrwxrwx. 1 root root   41 Jul 14  2015 dbus-org.fedoraproject.FirewallD1.service -> /usr/lib/systemd/system/firewalld.service
lrwxrwxrwx. 1 root root   46 Jul 14  2015 dbus-org.freedesktop.NetworkManager.service -> /usr/lib/systemd/system/NetworkManager.service
lrwxrwxrwx. 1 root root   57 Jul 14  2015 dbus-org.freedesktop.nm-dispatcher.service -> /usr/lib/systemd/system/NetworkManager-dispatcher.service
lrwxrwxrwx  1 root root   41 May 13 11:21 default.target -> /usr/lib/systemd/system/multi-user.target
drwxr-xr-x. 2 root root   85 Jul 14  2015 default.target.wants
drwxr-xr-x. 2 root root   31 Jul 14  2015 getty.target.wants
drwxr-xr-x. 2 root root 4096 Apr  6 09:45 multi-user.target.wants
drwxr-xr-x. 2 root root   49 Jul 14  2015 sockets.target.wants
drwxr-xr-x. 2 root root   59 Jul 14  2015 sysinit.target.wants
drwxr-xr-x. 2 root root   43 Jul 14  2015 system-update.target.wants

As you can see, my default.target is symlinked to multi-user.target. Which is a bit comparable to the old runlevel 3.

We can get and set the default runlevel with systemctl set-default and systemctl get-default

[root@10 system]# systemctl get-default 
multi-user.target
[root@10 system]# systemctl set-default graphical.target
rm '/etc/systemd/system/default.target'
ln -s '/usr/lib/systemd/system/graphical.target' '/etc/systemd/system/default.target'

For simplicity's sake, let's assume that I have set my default target back to multi-user.target. If we take a look at the multi-user.target file, we will see the following:

[root@10 system]# cat /usr/lib/systemd/system/multi-user.target

[Unit]
Description=Multi-User System
Documentation=man:systemd.special(7)
Requires=basic.target
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes

[Install]
Alias=default.target

Requires=basic.target means that all units within the basic.target.wants directories (more on the .wants directories later) must be activated first before the current unit (multi-user.target) can be activated. So for now, the control is handed over to basic.target, which in its turn requires sysinit.target.

Requires=sysinit.target

In sysinit.target, there is no Requires= directive specified, but it does have a Wants= directive.

Wants=local-fs.target swap.target

The Wants= directive lists the units that the current unit will try to activate. However, if these units fail to be activated, it won't stop the current unit (in this case sysinit.target) from being activated.

What systemd will do at this point, is to look for .wants directories under /etc/systemd/system/ and (/usr)/lib/systemd/system/ (more on their differences later) with the name of the the specified target. In this case, it finds /usr/lib/systemd/system/local-fs.target.wants/ and activates the units within that directory.

From that point, the control is passed back up the chain again. First back to sysinit.target and activates units within sysinit.target.wants in both /etc/systemd/system and (/usr)/lib/systemd/system directories, then the same for basic.target and finally back to multi-user.target. A great way to visualize how control has been passed from one target to another and which units have been activated in the process, is to use systemctl list-dependencies <name.target>

It is important to note that services in systemd are activated in a parallel manner, whereas in sysvinit was done sequentially (by the number of K- and S- scripts in runlevel directories).

Another note of importance is that there is a difference between the .wants directories in /etc/systemd/system/ and (/usr)/lib/systemd/system/. While the .wants directories in the former is setup by the system and should never be touched by users, the .wants directories in the latter allow the user to manage his own dependencies. This becomes obvious when you enable / disable services to be run upon boot.

Systemd - Enable service at startup

When you install a service (such as nginx in this example), your package manager will create a .service file in the /usr/lib/systemd/system/ directory.

[root@10 vagrant]# ls /usr/lib/systemd/system | grep nginx
nginx-debug.service
nginx.service

To make sure that a service is enabled upon boot, you can use the systemctl enable command:

[root@10 vagrant]# systemctl enable nginx
ln -s '/usr/lib/systemd/system/nginx.service' '/etc/systemd/system/multi-user.target.wants/nginx.service'

What we can see above, is that the systemctl enable command actually creates a symlink from the .wants directory of our default target (in our case multi-user.wants) to /usr/lib/systemd/system/ for our service. In a way a bit comparable to how the scripts in runlevel directories (/etc/rc{number}.d/) in sysvinint were symlinked to scripts in /etc/init.d/.

You can also use systemctl disable <service> to disable a service. In that case, the previously created symlink will be removed.

Systemd - start / stop services

In sysvinit, there were different commands for managing services at startup (chkconfig) and for running/stopping services (service). In systemd, that is all controlled by the systemctl command.

To start a service, use systemctl start <service>

[root@10 vagrant]# systemctl start nginx
[root@10 vagrant]# systemctl status nginx
nginx.service - nginx - high performance web server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled)
   Active: active (running) since Sun 2016-05-15 14:35:07 UTC; 2s ago
     Docs: http://nginx.org/en/docs/
  Process: 2735 ExecStop=/bin/kill -s QUIT $MAINPID (code=exited, status=0/SUCCESS)
  Process: 2744 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)
  Process: 2743 ExecStartPre=/usr/sbin/nginx -t -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)
 Main PID: 2747 (nginx)
   CGroup: /system.slice/nginx.service
           ├─2747 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
           └─2748 nginx: worker process


Systemd - Change targets

In sysvinit, a sys-admin might have to change runlevel to perform administrative tasks. In systemd, since there are no runlevels anymore, you will have to change target:

[root@10 system]# systemctl isolate rescue.target
[root@10 system]# who -r
[root@10 system]# runlevel
3 1 # indicates that the current runlevel is 1 and previous runlevel is 3

Ironically enough, to get the current target, you would still use the old who -r and runlevel commands.