Page tree
Skip to end of metadata
Go to start of metadata

Table of contents

 

Deduplicating lists

Don is a headmaster of schools and he wants to send greeting cards to pupils' parents and to providers. Unfortunately, the lists have quite a few names in common, such as providers who have their children at school. We'll help Don de-duplicate his list. Let's say you have two tables:

Parents
NameAddress
Michael Wilson72 Oxford Street 
Brian Carter16 Darlinghurst Street
Karen Collins82 Newtown Park
Providers
NameContactAddress
Catering & CoSharon Young91 York Street
Phone Services LLCJason Miller11 George Street
Post OfficeBrian Carter16 Darlinghurst Street
SELECT DISTINCT Name, Address FROM
(
  SELECT Name, Address FROM Parents
  UNION ALL
  SELECT Name, Address FROM Providers
) ALL_CONTACTS

This query add all records of Parents and Providers into an intermediary table ALL_CONTACTS, then only returns distincts records.

Subtracting lists

Subtracting lists is simpler. This query only returns parents who are not providers:

SELECT Name, Address FROM Parents
WHERE Parents.Name NOT IN (SELECT Name FROM Providers)

The subquery finds all names of providers, and the main query returns all the parents whose name is not on that list. SQL supports other interesting queries:

SELECT Name FROM Parents
EXCEPT
SELECT Name FROM Providers
SELECT Name FROM Parents
INTERSECT
SELECT Name FROM Providers

The All-Star Pivot Query

There's a little trick in SQL where you put rows into columns:

NameRole: ParentRole: Provider
Michael WilsonX 
Brian CarterXX
Karen CollinsX 
Sharon Young X
Jason Miller X

Here's an example:

SELECT Name, MAX(CASE Role WHEN 'Parent' THEN 'X' END) "Role: Parent", MAX(CASE Role WHEN 'Provider' THEN 'X' END) "Role: Provider"
FROM (
    SELECT Name, 'Parent' Role FROM Parents
	UNION ALL
	SELECT Name, 'Provider' Role FROM Providers
) Roles
GROUP BY Name
  • The subquery 'Roles' creates a list with two columns: Name and Role. The Role is just the expression 'Parent' or 'Provider'.
  • The "CASE" in the main query are equivalent to an IF/THEN/ELSE function: It takes the Role, it displays 'X' if it is equal to 'Parent', otherwise it displays NULL.
  • The MAX returns the maximum value for the Name. When the person is both a parent and a provider, there are 2 records about it:

    NameRole: ParentRole: Provider
    Brian CarterX 
    Brian Carter X

     

    The origin of those two rows is, one comes from the Parents table and the other from the Providers table. To merge the two rows, we take the max of each column. NULL being weaker, the X will be the result:

    NameRole: ParentRole: Provider
    Brian CarterXX

SQL from external programs - A script in 3 minutes!

SQL is an industry standard. A lot of database management programs can interface with SQL, especially scripts. Let's take a 3-minutes example using Groovy. The following example is for Linux and Mac (Windows users can install Groovy with one or two more steps).

  • Please ask your Confluence Administrator for the details of the datasource they use. Depending on the situation, you may or may not be granted access to it.
  • Please download the postgresql driver and put it in a folder,
  • In the same folder, create a file named example.groovy, and copy the following contents:
example.groovy
#!/usr/bin/env groovy -cp postgresql-9.2-1002.jdbc4.jar

import groovy.sql.Sql
sql = Sql.newInstance( 'jdbc:postgresql://localhost:5432/databasename', 'username', 'password', 'org.postgresql.Driver' )
sql.eachRow( 'select * from SPACE_DS.PUPILS' ) {
	println "$it.name is in grade $it.grade"
} 
  • Then execute the following lines in Terminal:
# Install Groovy
> sudo apt-get install groovy
 
# Make your script executable
> chmod u+x example.groovy
 
# Execute the script
> ./example.groovy
 
Noah is in grade 11
Kim is in grade 8
Viktor is in grade 3
...

Here is more information about Groovy scripts and SQL.

(tick) Now, you know how simple it is to read and write into a database!

 

On this page, you've learnt:

  • How to deduplicate
  • How to intersect and subtract
  • How to write a pivot query
  • How to interface with other programs

 

If you find any mistake on this page, please email me. My license for this website is restricted to a few users.

  • No labels