Databases Pt. 2¶
Guest lecture by Bradley Dice, [@bdice](https://github.com/bdice/).
Example, continued: IMBD, Index of Marvel data with Bradley Dice¶
The code below sets up the movie database that we created last time, all in one cell.
In [1]:
import sqlite3
import os
database = 'movie_ratings.sqlite'
# Removes the database if it already exists
if os.path.exists(database):
os.remove(database)
with sqlite3.connect(database) as conn:
# Here we create a primary key, and use "NOT NULL" to prevent inserting invalid data
conn.execute(
"""CREATE TABLE Movies
(id INTEGER PRIMARY KEY AUTOINCREMENT, title TEXT NOT NULL, year INTEGER NOT NULL)""")
conn.executemany(
"""INSERT INTO Movies (title, year) VALUES (?,?)""",
[('Iron Man', 2008),
('The Incredible Hulk', 2008),
('Iron Man 2', 2010),
('Thor', 2011),
('Captain America: The First Avenger', 2011),
('The Avengers', 2012),
('Iron Man 3', 2013),
('Captain America: The Winter Soldier', 2014),
('Avengers: Age of Ultron', 2015),
('Captain America: Civil War', 2016),
('Doctor Strange', 2016),
('Black Panther', 2018),
('Avengers: Infinity War', 2018),
])
conn.execute(
"""CREATE TABLE Actors
(id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
character TEXT NOT NULL,
age REAL NOT NULL)""")
conn.executemany(
"""INSERT INTO Actors (first_name, last_name, character, age) VALUES (?,?,?,?)""",
[('Robert', 'Downey Jr.', 'Iron Man', 53),
('Chris', 'Evans', 'Captain America', 37),
('Scarlett', 'Johansson', 'Black Widow', 33),
('Samuel L.', 'Jackson', 'Nick Fury', 69),
('Benedict', 'Cumberbatch', 'Dr. Strange', 42),
('Brie', 'Larson', 'Captain Marvel', 29),
('Chadwick', 'Boseman', 'Black Panther', 40)
])
conn.execute(
"""CREATE TABLE MovieActors
(id INTEGER PRIMARY KEY AUTOINCREMENT, movie_id INTEGER NOT NULL, actor_id INTEGER NOT NULL)""")
conn.executemany(
"""INSERT INTO MovieActors (movie_id, actor_id) VALUES (?,?)""",
[(1, 1), (2, 1), (3, 1), (6, 1), (7, 1), (9, 1), (10, 1), (13, 1), (5, 2), (6, 2), (8, 2), (9, 2), (10, 2), (13, 2),
(3, 3), (6, 3), (8, 3), (9, 3), (10, 3), (13, 3), (1, 4), (3, 4), (4, 4), (5, 4), (6, 4), (8, 4), (9, 4), (13, 4),
(11, 5), (13, 5), (10, 7), (12, 7), (13, 7)])
Using pandas 🐼 to help us:¶
In [2]:
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline
with sqlite3.connect(database) as conn:
df = pd.read_sql('SELECT * FROM Movies', conn)
display(df)
df.hist('year')
id | title | year | |
---|---|---|---|
0 | 1 | Iron Man | 2008 |
1 | 2 | The Incredible Hulk | 2008 |
2 | 3 | Iron Man 2 | 2010 |
3 | 4 | Thor | 2011 |
4 | 5 | Captain America: The First Avenger | 2011 |
5 | 6 | The Avengers | 2012 |
6 | 7 | Iron Man 3 | 2013 |
7 | 8 | Captain America: The Winter Soldier | 2014 |
8 | 9 | Avengers: Age of Ultron | 2015 |
9 | 10 | Captain America: Civil War | 2016 |
10 | 11 | Doctor Strange | 2016 |
11 | 12 | Black Panther | 2018 |
12 | 13 | Avengers: Infinity War | 2018 |
Review of table contents¶
The next three cells print the table content for Movies
, Actors
,
and MovieActors
.
In [3]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT * FROM Actors""", conn)
df
Out[3]:
id | first_name | last_name | character | age | |
---|---|---|---|---|---|
0 | 1 | Robert | Downey Jr. | Iron Man | 53.0 |
1 | 2 | Chris | Evans | Captain America | 37.0 |
2 | 3 | Scarlett | Johansson | Black Widow | 33.0 |
3 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 |
4 | 5 | Benedict | Cumberbatch | Dr. Strange | 42.0 |
5 | 6 | Brie | Larson | Captain Marvel | 29.0 |
6 | 7 | Chadwick | Boseman | Black Panther | 40.0 |
In [4]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT * FROM Movies""", conn)
df
Out[4]:
id | title | year | |
---|---|---|---|
0 | 1 | Iron Man | 2008 |
1 | 2 | The Incredible Hulk | 2008 |
2 | 3 | Iron Man 2 | 2010 |
3 | 4 | Thor | 2011 |
4 | 5 | Captain America: The First Avenger | 2011 |
5 | 6 | The Avengers | 2012 |
6 | 7 | Iron Man 3 | 2013 |
7 | 8 | Captain America: The Winter Soldier | 2014 |
8 | 9 | Avengers: Age of Ultron | 2015 |
9 | 10 | Captain America: Civil War | 2016 |
10 | 11 | Doctor Strange | 2016 |
11 | 12 | Black Panther | 2018 |
12 | 13 | Avengers: Infinity War | 2018 |
In [5]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT * FROM MovieActors""", conn)
df
Out[5]:
id | movie_id | actor_id | |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 2 | 2 | 1 |
2 | 3 | 3 | 1 |
3 | 4 | 6 | 1 |
4 | 5 | 7 | 1 |
5 | 6 | 9 | 1 |
6 | 7 | 10 | 1 |
7 | 8 | 13 | 1 |
8 | 9 | 5 | 2 |
9 | 10 | 6 | 2 |
10 | 11 | 8 | 2 |
11 | 12 | 9 | 2 |
12 | 13 | 10 | 2 |
13 | 14 | 13 | 2 |
14 | 15 | 3 | 3 |
15 | 16 | 6 | 3 |
16 | 17 | 8 | 3 |
17 | 18 | 9 | 3 |
18 | 19 | 10 | 3 |
19 | 20 | 13 | 3 |
20 | 21 | 1 | 4 |
21 | 22 | 3 | 4 |
22 | 23 | 4 | 4 |
23 | 24 | 5 | 4 |
24 | 25 | 6 | 4 |
25 | 26 | 8 | 4 |
26 | 27 | 9 | 4 |
27 | 28 | 13 | 4 |
28 | 29 | 11 | 5 |
29 | 30 | 13 | 5 |
30 | 31 | 10 | 7 |
31 | 32 | 12 | 7 |
32 | 33 | 13 | 7 |
JOINing [STRIKEOUT:forces] tables¶
This demonstrates a basic JOIN
operation.
In [6]:
with sqlite3.connect(database) as conn:
df = pd.read_sql(
"""SELECT Movies.title, Actors.character FROM MovieActors
JOIN Movies ON MovieActors.movie_id = Movies.id
JOIN Actors ON MovieActors.actor_id = Actors.id""", conn)
df
Out[6]:
title | character | |
---|---|---|
0 | Iron Man | Iron Man |
1 | The Incredible Hulk | Iron Man |
2 | Iron Man 2 | Iron Man |
3 | The Avengers | Iron Man |
4 | Iron Man 3 | Iron Man |
5 | Avengers: Age of Ultron | Iron Man |
6 | Captain America: Civil War | Iron Man |
7 | Avengers: Infinity War | Iron Man |
8 | Captain America: The First Avenger | Captain America |
9 | The Avengers | Captain America |
10 | Captain America: The Winter Soldier | Captain America |
11 | Avengers: Age of Ultron | Captain America |
12 | Captain America: Civil War | Captain America |
13 | Avengers: Infinity War | Captain America |
14 | Iron Man 2 | Black Widow |
15 | The Avengers | Black Widow |
16 | Captain America: The Winter Soldier | Black Widow |
17 | Avengers: Age of Ultron | Black Widow |
18 | Captain America: Civil War | Black Widow |
19 | Avengers: Infinity War | Black Widow |
20 | Iron Man | Nick Fury |
21 | Iron Man 2 | Nick Fury |
22 | Thor | Nick Fury |
23 | Captain America: The First Avenger | Nick Fury |
24 | The Avengers | Nick Fury |
25 | Captain America: The Winter Soldier | Nick Fury |
26 | Avengers: Age of Ultron | Nick Fury |
27 | Avengers: Infinity War | Nick Fury |
28 | Doctor Strange | Dr. Strange |
29 | Avengers: Infinity War | Dr. Strange |
30 | Captain America: Civil War | Black Panther |
31 | Black Panther | Black Panther |
32 | Avengers: Infinity War | Black Panther |
Types of JOINs¶
The default when you just write JOIN
is also called INNER JOIN
,
but there are also LEFT JOIN
, RIGHT JOIN
, and OUTER JOIN
.
There are different kinds of JOIN
s, explained nicely in this
resource.
The types of JOIN
s can be represented as Venn diagrams, visualized
on this website.
LEFT JOIN example¶
Note that this LEFT JOIN
leaves us with a NaN
for Brie Larson,
whose movie Captain Marvel has not been released yet.
In [7]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT Actors.first_name, Actors.last_name, MovieActors.movie_id
FROM Actors LEFT JOIN MovieActors ON Actors.id = MovieActors.actor_id""", conn)
df
Out[7]:
first_name | last_name | movie_id | |
---|---|---|---|
0 | Robert | Downey Jr. | 1.0 |
1 | Robert | Downey Jr. | 2.0 |
2 | Robert | Downey Jr. | 3.0 |
3 | Robert | Downey Jr. | 6.0 |
4 | Robert | Downey Jr. | 7.0 |
5 | Robert | Downey Jr. | 9.0 |
6 | Robert | Downey Jr. | 10.0 |
7 | Robert | Downey Jr. | 13.0 |
8 | Chris | Evans | 5.0 |
9 | Chris | Evans | 6.0 |
10 | Chris | Evans | 8.0 |
11 | Chris | Evans | 9.0 |
12 | Chris | Evans | 10.0 |
13 | Chris | Evans | 13.0 |
14 | Scarlett | Johansson | 3.0 |
15 | Scarlett | Johansson | 6.0 |
16 | Scarlett | Johansson | 8.0 |
17 | Scarlett | Johansson | 9.0 |
18 | Scarlett | Johansson | 10.0 |
19 | Scarlett | Johansson | 13.0 |
20 | Samuel L. | Jackson | 1.0 |
21 | Samuel L. | Jackson | 3.0 |
22 | Samuel L. | Jackson | 4.0 |
23 | Samuel L. | Jackson | 5.0 |
24 | Samuel L. | Jackson | 6.0 |
25 | Samuel L. | Jackson | 8.0 |
26 | Samuel L. | Jackson | 9.0 |
27 | Samuel L. | Jackson | 13.0 |
28 | Benedict | Cumberbatch | 11.0 |
29 | Benedict | Cumberbatch | 13.0 |
30 | Brie | Larson | NaN |
31 | Chadwick | Boseman | 10.0 |
32 | Chadwick | Boseman | 12.0 |
33 | Chadwick | Boseman | 13.0 |
In [8]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT Actors.first_name, Actors.last_name,
MAX(Movies.year) AS most_recent_release_year, Movies.title
FROM Actors JOIN MovieActors ON Actors.id = MovieActors.actor_id
JOIN Movies ON MovieActors.movie_id = Movies.id
GROUP BY Actors.last_name""", conn)
df
Out[8]:
first_name | last_name | most_recent_release_year | title | |
---|---|---|---|---|
0 | Chadwick | Boseman | 2018 | Black Panther |
1 | Benedict | Cumberbatch | 2018 | Avengers: Infinity War |
2 | Robert | Downey Jr. | 2018 | Avengers: Infinity War |
3 | Chris | Evans | 2018 | Avengers: Infinity War |
4 | Samuel L. | Jackson | 2018 | Avengers: Infinity War |
5 | Scarlett | Johansson | 2018 | Avengers: Infinity War |
More Derived Columns¶
Here, we make a derived column name
by concatenating first_name
with a space and last_name
. The syntax for other concatenation in
other SQL databases can differ, e.g. with the function CONCAT
.
In [9]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT *,
Actors.first_name || " " || Actors.last_name AS name FROM MovieActors
JOIN Movies ON MovieActors.movie_id = Movies.id
JOIN Actors ON MovieActors.actor_id = Actors.id""", conn)
df
Out[9]:
id | movie_id | actor_id | id | title | year | id | first_name | last_name | character | age | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | Iron Man | 2008 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
1 | 2 | 2 | 1 | 2 | The Incredible Hulk | 2008 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
2 | 3 | 3 | 1 | 3 | Iron Man 2 | 2010 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
3 | 4 | 6 | 1 | 6 | The Avengers | 2012 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
4 | 5 | 7 | 1 | 7 | Iron Man 3 | 2013 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
5 | 6 | 9 | 1 | 9 | Avengers: Age of Ultron | 2015 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
6 | 7 | 10 | 1 | 10 | Captain America: Civil War | 2016 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
7 | 8 | 13 | 1 | 13 | Avengers: Infinity War | 2018 | 1 | Robert | Downey Jr. | Iron Man | 53.0 | Robert Downey Jr. |
8 | 9 | 5 | 2 | 5 | Captain America: The First Avenger | 2011 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
9 | 10 | 6 | 2 | 6 | The Avengers | 2012 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
10 | 11 | 8 | 2 | 8 | Captain America: The Winter Soldier | 2014 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
11 | 12 | 9 | 2 | 9 | Avengers: Age of Ultron | 2015 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
12 | 13 | 10 | 2 | 10 | Captain America: Civil War | 2016 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
13 | 14 | 13 | 2 | 13 | Avengers: Infinity War | 2018 | 2 | Chris | Evans | Captain America | 37.0 | Chris Evans |
14 | 15 | 3 | 3 | 3 | Iron Man 2 | 2010 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
15 | 16 | 6 | 3 | 6 | The Avengers | 2012 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
16 | 17 | 8 | 3 | 8 | Captain America: The Winter Soldier | 2014 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
17 | 18 | 9 | 3 | 9 | Avengers: Age of Ultron | 2015 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
18 | 19 | 10 | 3 | 10 | Captain America: Civil War | 2016 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
19 | 20 | 13 | 3 | 13 | Avengers: Infinity War | 2018 | 3 | Scarlett | Johansson | Black Widow | 33.0 | Scarlett Johansson |
20 | 21 | 1 | 4 | 1 | Iron Man | 2008 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
21 | 22 | 3 | 4 | 3 | Iron Man 2 | 2010 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
22 | 23 | 4 | 4 | 4 | Thor | 2011 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
23 | 24 | 5 | 4 | 5 | Captain America: The First Avenger | 2011 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
24 | 25 | 6 | 4 | 6 | The Avengers | 2012 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
25 | 26 | 8 | 4 | 8 | Captain America: The Winter Soldier | 2014 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
26 | 27 | 9 | 4 | 9 | Avengers: Age of Ultron | 2015 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
27 | 28 | 13 | 4 | 13 | Avengers: Infinity War | 2018 | 4 | Samuel L. | Jackson | Nick Fury | 69.0 | Samuel L. Jackson |
28 | 29 | 11 | 5 | 11 | Doctor Strange | 2016 | 5 | Benedict | Cumberbatch | Dr. Strange | 42.0 | Benedict Cumberbatch |
29 | 30 | 13 | 5 | 13 | Avengers: Infinity War | 2018 | 5 | Benedict | Cumberbatch | Dr. Strange | 42.0 | Benedict Cumberbatch |
30 | 31 | 10 | 7 | 10 | Captain America: Civil War | 2016 | 7 | Chadwick | Boseman | Black Panther | 40.0 | Chadwick Boseman |
31 | 32 | 12 | 7 | 12 | Black Panther | 2018 | 7 | Chadwick | Boseman | Black Panther | 40.0 | Chadwick Boseman |
32 | 33 | 13 | 7 | 13 | Avengers: Infinity War | 2018 | 7 | Chadwick | Boseman | Black Panther | 40.0 | Chadwick Boseman |
Visualizing Data with Pandas¶
So how long have these actors been doing Marvel movies, anyway?
In [10]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT MAX(Movies.year) - MIN(Movies.year) AS years_of_service,
Actors.first_name || " " || Actors.last_name AS name FROM MovieActors
JOIN Movies ON MovieActors.movie_id = Movies.id
JOIN Actors ON MovieActors.actor_id = Actors.id
GROUP BY name
ORDER BY years_of_service DESC""", conn)
df.plot.bar(x='name', y='years_of_service')
And how old were they when they started?
In [11]:
with sqlite3.connect(database) as conn:
df = pd.read_sql("""SELECT Actors.age - (MAX(Movies.year) - MIN(Movies.year)) as first_movie_age,
Actors.first_name || " " || Actors.last_name as name FROM MovieActors
JOIN Movies ON MovieActors.movie_id = Movies.id
JOIN Actors ON MovieActors.actor_id = Actors.id
GROUP BY name
ORDER BY first_movie_age DESC""", conn)
df.plot.bar(x='name', y='first_movie_age')
We can visualize this data in many ways, including this graph. (Note that Brie Larson appears, who was absent from our earlier plots because her movie, Captain Marvel, hasn’t been released yet.)
In [12]:
# Fancy demo, requires networkx
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
# Create a networkx Graph object
G = nx.Graph()
# Fetch data and fill the graph
with sqlite3.connect(database) as conn:
movies = conn.execute("""SELECT title FROM Movies""")
G.add_nodes_from([movie[0] for movie in movies], bipartite=0)
actors = conn.execute("""SELECT first_name || " " || last_name FROM Actors""")
G.add_nodes_from([actor[0] for actor in actors], bipartite=1)
edges = conn.execute("""SELECT Movies.title, Actors.first_name || " " || Actors.last_name FROM MovieActors
JOIN Movies ON MovieActors.movie_id = Movies.id
JOIN Actors ON MovieActors.actor_id = Actors.id""")
G.add_edges_from(edges)
# Plot the graph
plt.figure(figsize=(16, 12))
pos = nx.kamada_kawai_layout(G)
nx.draw_networkx_nodes(G, pos, nodelist=[n for n, d in G.nodes(data=True) if d['bipartite']==0], node_color='gold')
nx.draw_networkx_nodes(G, pos, nodelist=[n for n, d in G.nodes(data=True) if d['bipartite']==1], node_color='blue')
nx.draw_networkx_labels(G, pos, font_size=14)
nx.draw_networkx_edges(G, pos)
plt.axis('off')
plt.show()
Observation: Avengers movies are near the center! What else could we learn from this data?
What if we wanted to add ratings? What kind of mapping is needed? What table columns would we use?
Common problems in designing relational databases¶
Update anomaly¶
### Insertion anomaly ### Deletion anomaly
Normal Form¶
There are rules to help us design relational databases! They’re called “normal form.” This is a term worth looking up if you need to design a database.
Chinook Sample Database¶
This is a sample database named after winds in Canada, which is a joke from another sample database called Northwind. I picked this because it’s rich and demonstrates a professional, production-level database schema.
In [13]:
!echo "Removing previously downloaded database." && rm -f Chinook.sqlite
!echo "Downloading database." && curl https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite > Chinook.sqlite
!echo "Done. Check that the file exists:"
!ls "Chinook.sqlite"
Removing previously downloaded database.
Downloading database.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1042k 100 1042k 0 0 176k 0 0:00:05 0:00:05 --:--:-- 313k
Done. Check that the file exists:
Chinook.sqlite
In [14]:
database = 'Chinook.sqlite'
def get_results(sql, *args, **kwargs):
with sqlite3.connect(database) as conn:
df = pd.read_sql(sql, conn, *args, **kwargs)
return df
def show_results(sql, *args, **kwargs):
display(get_results(sql, *args, **kwargs))
show_results("""SELECT * FROM sqlite_master""")
type | name | tbl_name | rootpage | sql | |
---|---|---|---|---|---|
0 | table | Album | Album | 2 | CREATE TABLE [Album]\n(\n [AlbumId] INTEGER... |
1 | table | Artist | Artist | 3 | CREATE TABLE [Artist]\n(\n [ArtistId] INTEG... |
2 | table | Customer | Customer | 4 | CREATE TABLE [Customer]\n(\n [CustomerId] I... |
3 | table | Employee | Employee | 7 | CREATE TABLE [Employee]\n(\n [EmployeeId] I... |
4 | table | Genre | Genre | 9 | CREATE TABLE [Genre]\n(\n [GenreId] INTEGER... |
5 | table | Invoice | Invoice | 10 | CREATE TABLE [Invoice]\n(\n [InvoiceId] INT... |
6 | table | InvoiceLine | InvoiceLine | 12 | CREATE TABLE [InvoiceLine]\n(\n [InvoiceLin... |
7 | table | MediaType | MediaType | 14 | CREATE TABLE [MediaType]\n(\n [MediaTypeId]... |
8 | table | Playlist | Playlist | 15 | CREATE TABLE [Playlist]\n(\n [PlaylistId] I... |
9 | table | PlaylistTrack | PlaylistTrack | 16 | CREATE TABLE [PlaylistTrack]\n(\n [Playlist... |
10 | index | sqlite_autoindex_PlaylistTrack_1 | PlaylistTrack | 17 | None |
11 | table | Track | Track | 19 | CREATE TABLE [Track]\n(\n [TrackId] INTEGER... |
12 | index | IPK_Album | Album | 21 | CREATE UNIQUE INDEX [IPK_Album] ON [Album]([Al... |
13 | index | IPK_Artist | Artist | 22 | CREATE UNIQUE INDEX [IPK_Artist] ON [Artist]([... |
14 | index | IPK_Customer | Customer | 23 | CREATE UNIQUE INDEX [IPK_Customer] ON [Custome... |
15 | index | IPK_Employee | Employee | 24 | CREATE UNIQUE INDEX [IPK_Employee] ON [Employe... |
16 | index | IPK_Genre | Genre | 26 | CREATE UNIQUE INDEX [IPK_Genre] ON [Genre]([Ge... |
17 | index | IPK_Invoice | Invoice | 27 | CREATE UNIQUE INDEX [IPK_Invoice] ON [Invoice]... |
18 | index | IPK_InvoiceLine | InvoiceLine | 28 | CREATE UNIQUE INDEX [IPK_InvoiceLine] ON [Invo... |
19 | index | IPK_MediaType | MediaType | 29 | CREATE UNIQUE INDEX [IPK_MediaType] ON [MediaT... |
20 | index | IPK_Playlist | Playlist | 30 | CREATE UNIQUE INDEX [IPK_Playlist] ON [Playlis... |
21 | index | IPK_PlaylistTrack | PlaylistTrack | 31 | CREATE UNIQUE INDEX [IPK_PlaylistTrack] ON [Pl... |
22 | index | IPK_Track | Track | 32 | CREATE UNIQUE INDEX [IPK_Track] ON [Track]([Tr... |
23 | index | IFK_AlbumArtistId | Album | 33 | CREATE INDEX [IFK_AlbumArtistId] ON [Album] ([... |
24 | index | IFK_CustomerSupportRepId | Customer | 34 | CREATE INDEX [IFK_CustomerSupportRepId] ON [Cu... |
25 | index | IFK_EmployeeReportsTo | Employee | 36 | CREATE INDEX [IFK_EmployeeReportsTo] ON [Emplo... |
26 | index | IFK_InvoiceCustomerId | Invoice | 37 | CREATE INDEX [IFK_InvoiceCustomerId] ON [Invoi... |
27 | index | IFK_InvoiceLineInvoiceId | InvoiceLine | 38 | CREATE INDEX [IFK_InvoiceLineInvoiceId] ON [In... |
28 | index | IFK_InvoiceLineTrackId | InvoiceLine | 39 | CREATE INDEX [IFK_InvoiceLineTrackId] ON [Invo... |
29 | index | IFK_PlaylistTrackTrackId | PlaylistTrack | 40 | CREATE INDEX [IFK_PlaylistTrackTrackId] ON [Pl... |
30 | index | IFK_TrackAlbumId | Track | 41 | CREATE INDEX [IFK_TrackAlbumId] ON [Track] ([A... |
31 | index | IFK_TrackGenreId | Track | 42 | CREATE INDEX [IFK_TrackGenreId] ON [Track] ([G... |
32 | index | IFK_TrackMediaTypeId | Track | 43 | CREATE INDEX [IFK_TrackMediaTypeId] ON [Track]... |
Table structures for music data¶
In [15]:
show_results("""PRAGMA table_info(Track)""") # sqlite specific
show_results("""PRAGMA table_info(Genre)""")
show_results("""PRAGMA table_info(Album)""")
show_results("""PRAGMA table_info(Artist)""")
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | TrackId | INTEGER | 1 | None | 1 |
1 | 1 | Name | NVARCHAR(200) | 1 | None | 0 |
2 | 2 | AlbumId | INTEGER | 0 | None | 0 |
3 | 3 | MediaTypeId | INTEGER | 1 | None | 0 |
4 | 4 | GenreId | INTEGER | 0 | None | 0 |
5 | 5 | Composer | NVARCHAR(220) | 0 | None | 0 |
6 | 6 | Milliseconds | INTEGER | 1 | None | 0 |
7 | 7 | Bytes | INTEGER | 0 | None | 0 |
8 | 8 | UnitPrice | NUMERIC(10,2) | 1 | None | 0 |
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | GenreId | INTEGER | 1 | None | 1 |
1 | 1 | Name | NVARCHAR(120) | 0 | None | 0 |
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | AlbumId | INTEGER | 1 | None | 1 |
1 | 1 | Title | NVARCHAR(160) | 1 | None | 0 |
2 | 2 | ArtistId | INTEGER | 1 | None | 0 |
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | ArtistId | INTEGER | 1 | None | 1 |
1 | 1 | Name | NVARCHAR(120) | 0 | None | 0 |
Generating a typical music library layout¶
In [16]:
show_results("""SELECT Track.Name, Album.Title, Artist.Name, Genre.Name FROM Track
JOIN Genre on Track.GenreId = Genre.GenreId
JOIN Album on Track.AlbumId = Album.AlbumId
JOIN Artist on Album.ArtistId = Artist.ArtistId""")
Name | Title | Name | Name | |
---|---|---|---|---|
0 | For Those About To Rock (We Salute You) | For Those About To Rock We Salute You | AC/DC | Rock |
1 | Balls to the Wall | Balls to the Wall | Accept | Rock |
2 | Fast As a Shark | Restless and Wild | Accept | Rock |
3 | Restless and Wild | Restless and Wild | Accept | Rock |
4 | Princess of the Dawn | Restless and Wild | Accept | Rock |
5 | Put The Finger On You | For Those About To Rock We Salute You | AC/DC | Rock |
6 | Let's Get It Up | For Those About To Rock We Salute You | AC/DC | Rock |
7 | Inject The Venom | For Those About To Rock We Salute You | AC/DC | Rock |
8 | Snowballed | For Those About To Rock We Salute You | AC/DC | Rock |
9 | Evil Walks | For Those About To Rock We Salute You | AC/DC | Rock |
10 | C.O.D. | For Those About To Rock We Salute You | AC/DC | Rock |
11 | Breaking The Rules | For Those About To Rock We Salute You | AC/DC | Rock |
12 | Night Of The Long Knives | For Those About To Rock We Salute You | AC/DC | Rock |
13 | Spellbound | For Those About To Rock We Salute You | AC/DC | Rock |
14 | Go Down | Let There Be Rock | AC/DC | Rock |
15 | Dog Eat Dog | Let There Be Rock | AC/DC | Rock |
16 | Let There Be Rock | Let There Be Rock | AC/DC | Rock |
17 | Bad Boy Boogie | Let There Be Rock | AC/DC | Rock |
18 | Problem Child | Let There Be Rock | AC/DC | Rock |
19 | Overdose | Let There Be Rock | AC/DC | Rock |
20 | Hell Ain't A Bad Place To Be | Let There Be Rock | AC/DC | Rock |
21 | Whole Lotta Rosie | Let There Be Rock | AC/DC | Rock |
22 | Walk On Water | Big Ones | Aerosmith | Rock |
23 | Love In An Elevator | Big Ones | Aerosmith | Rock |
24 | Rag Doll | Big Ones | Aerosmith | Rock |
25 | What It Takes | Big Ones | Aerosmith | Rock |
26 | Dude (Looks Like A Lady) | Big Ones | Aerosmith | Rock |
27 | Janie's Got A Gun | Big Ones | Aerosmith | Rock |
28 | Cryin' | Big Ones | Aerosmith | Rock |
29 | Amazing | Big Ones | Aerosmith | Rock |
... | ... | ... | ... | ... |
3473 | October Song | Frank | Amy Winehouse | Pop |
3474 | What Is It About Men | Frank | Amy Winehouse | Pop |
3475 | Help Yourself | Frank | Amy Winehouse | Pop |
3476 | Amy Amy Amy (Outro) | Frank | Amy Winehouse | Pop |
3477 | Slowness | Carried to Dust (Bonus Track Version) | Calexico | Alternative |
3478 | Prometheus Overture, Op. 43 | Beethoven: Symphony No. 6 'Pastoral' Etc. | Otto Klemperer & Philharmonia Orchestra | Classical |
3479 | Sonata for Solo Violin: IV: Presto | Bartok: Violin & Viola Concertos | Yehudi Menuhin | Classical |
3480 | A Midsummer Night's Dream, Op.61 Incidental Mu... | Mendelssohn: A Midsummer Night's Dream | Philharmonia Orchestra & Sir Neville Marriner | Classical |
3481 | Suite No. 3 in D, BWV 1068: III. Gavotte I & II | Bach: Orchestral Suites Nos. 1 - 4 | Academy of St. Martin in the Fields, Sir Nevil... | Classical |
3482 | Concert pour 4 Parties de V**les, H. 545: I. P... | Charpentier: Divertissements, Airs & Concerts | Les Arts Florissants & William Christie | Classical |
3483 | Adios nonino | South American Getaway | The 12 Cellists of The Berlin Philharmonic | Classical |
3484 | Symphony No. 3 Op. 36 for Orchestra and Sopran... | Górecki: Symphony No. 3 | Adrian Leaper & Doreen de Feis | Classical |
3485 | Act IV, Symphony | Purcell: The Fairy Queen | Roger Norrington, London Classical Players | Classical |
3486 | 3 Gymnopédies: No.1 - Lent Et Grave, No.3 - Le... | The Ultimate Relexation Album | Charles Dutoit & L'Orchestre Symphonique de Mo... | Classical |
3487 | Music for the Funeral of Queen Mary: VI. "Thou... | Purcell: Music for the Queen Mary | Equale Brass Ensemble, John Eliot Gardiner & M... | Classical |
3488 | Symphony No. 2: III. Allegro vivace | Weill: The Seven Deadly Sins | Kent Nagano and Orchestre de l'Opéra de Lyon | Classical |
3489 | Partita in E Major, BWV 1006A: I. Prelude | J.S. Bach: Chaconne, Suite in E Minor, Partita... | Julian Bream | Classical |
3490 | Le Sacre Du Printemps: I.iv. Spring Rounds | Prokofiev: Symphony No.5 & Stravinksy: Le Sacr... | Berliner Philharmoniker & Herbert Von Karajan | Classical |
3491 | Sing Joyfully | English Renaissance | The King's Singers | Classical |
3492 | Metopes, Op. 29: Calypso | Szymanowski: Piano Works, Vol. 1 | Martin Roscoe | Classical |
3493 | Symphony No. 2, Op. 16 - "The Four Temperamen... | Nielsen: The Six Symphonies | Göteborgs Symfoniker & Neeme Järvi | Classical |
3494 | 24 Caprices, Op. 1, No. 24, for Solo Violin, i... | Great Recordings of the Century: Paganini's 24... | Itzhak Perlman | Classical |
3495 | Étude 1, In C Major - Preludio (Presto) - Liszt | Liszt - 12 Études D'Execution Transcendante | Michele Campanella | Classical |
3496 | Erlkonig, D.328 | Great Recordings of the Century - Shubert: Sch... | Gerald Moore | Classical |
3497 | Concerto for Violin, Strings and Continuo in G... | Locatelli: Concertos for Violin, Strings and C... | Mela Tenenbaum, Pro Musica Prague & Richard Kapp | Classical |
3498 | Pini Di Roma (Pinien Von Rom) \ I Pini Della V... | Respighi:Pines of Rome | Eugene Ormandy | Classical |
3499 | String Quartet No. 12 in C Minor, D. 703 "Quar... | Schubert: The Late String Quartets & String Qu... | Emerson String Quartet | Classical |
3500 | L'orfeo, Act 3, Sinfonia (Orchestra) | Monteverdi: L'Orfeo | C. Monteverdi, Nigel Rogers - Chiaroscuro; Lon... | Classical |
3501 | Quintet for Horn, Violin, 2 Violas, and Cello ... | Mozart: Chamber Music | Nash Ensemble | Classical |
3502 | Koyaanisqatsi | Koyaanisqatsi (Soundtrack from the Motion Pict... | Philip Glass Ensemble | Soundtrack |
3503 rows × 4 columns
Artists with the most tracks¶
In [17]:
get_results("""SELECT COUNT(Track.Name) as NumTracks, Artist.Name FROM Track
JOIN Album on Track.AlbumId = Album.AlbumId
JOIN Artist on Album.ArtistId = Artist.ArtistId
GROUP BY Artist.Name ORDER BY NumTracks DESC""").head(10)
Out[17]:
NumTracks | Name | |
---|---|---|
0 | 213 | Iron Maiden |
1 | 135 | U2 |
2 | 114 | Led Zeppelin |
3 | 112 | Metallica |
4 | 92 | Deep Purple |
5 | 92 | Lost |
6 | 67 | Pearl Jam |
7 | 57 | Lenny Kravitz |
8 | 56 | Various Artists |
9 | 53 | The Office |
More table structures for sales data¶
In [18]:
show_results("""PRAGMA table_info(Customer)""")
show_results("""PRAGMA table_info(Invoice)""")
show_results("""PRAGMA table_info(InvoiceLine)""")
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | CustomerId | INTEGER | 1 | None | 1 |
1 | 1 | FirstName | NVARCHAR(40) | 1 | None | 0 |
2 | 2 | LastName | NVARCHAR(20) | 1 | None | 0 |
3 | 3 | Company | NVARCHAR(80) | 0 | None | 0 |
4 | 4 | Address | NVARCHAR(70) | 0 | None | 0 |
5 | 5 | City | NVARCHAR(40) | 0 | None | 0 |
6 | 6 | State | NVARCHAR(40) | 0 | None | 0 |
7 | 7 | Country | NVARCHAR(40) | 0 | None | 0 |
8 | 8 | PostalCode | NVARCHAR(10) | 0 | None | 0 |
9 | 9 | Phone | NVARCHAR(24) | 0 | None | 0 |
10 | 10 | Fax | NVARCHAR(24) | 0 | None | 0 |
11 | 11 | NVARCHAR(60) | 1 | None | 0 | |
12 | 12 | SupportRepId | INTEGER | 0 | None | 0 |
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | InvoiceId | INTEGER | 1 | None | 1 |
1 | 1 | CustomerId | INTEGER | 1 | None | 0 |
2 | 2 | InvoiceDate | DATETIME | 1 | None | 0 |
3 | 3 | BillingAddress | NVARCHAR(70) | 0 | None | 0 |
4 | 4 | BillingCity | NVARCHAR(40) | 0 | None | 0 |
5 | 5 | BillingState | NVARCHAR(40) | 0 | None | 0 |
6 | 6 | BillingCountry | NVARCHAR(40) | 0 | None | 0 |
7 | 7 | BillingPostalCode | NVARCHAR(10) | 0 | None | 0 |
8 | 8 | Total | NUMERIC(10,2) | 1 | None | 0 |
cid | name | type | notnull | dflt_value | pk | |
---|---|---|---|---|---|---|
0 | 0 | InvoiceLineId | INTEGER | 1 | None | 1 |
1 | 1 | InvoiceId | INTEGER | 1 | None | 0 |
2 | 2 | TrackId | INTEGER | 1 | None | 0 |
3 | 3 | UnitPrice | NUMERIC(10,2) | 1 | None | 0 |
4 | 4 | Quantity | INTEGER | 1 | None | 0 |
Total Spending by State/Country¶
In [19]:
get_results("""SELECT SUM(InvoiceLine.UnitPrice * InvoiceLine.Quantity) AS Spending, Customer.State, Customer.Country
FROM InvoiceLine
JOIN Invoice ON InvoiceLine.InvoiceId = Invoice.InvoiceId
JOIN Customer ON Invoice.CustomerId = Customer.CustomerId
GROUP BY Customer.State, Customer.Country ORDER BY Spending DESC""")
Out[19]:
Spending | State | Country | |
---|---|---|---|
0 | 195.10 | None | France |
1 | 156.48 | None | Germany |
2 | 115.86 | CA | USA |
3 | 114.86 | SP | Brazil |
4 | 112.86 | None | United Kingdom |
5 | 90.24 | None | Czech Republic |
6 | 77.24 | None | Portugal |
7 | 75.26 | None | India |
8 | 75.24 | ON | Canada |
9 | 47.62 | TX | USA |
10 | 46.62 | None | Chile |
11 | 45.62 | None | Hungary |
12 | 45.62 | Dublin | Ireland |
13 | 43.62 | IL | USA |
14 | 43.62 | UT | USA |
15 | 42.62 | WI | USA |
16 | 42.62 | None | Austria |
17 | 41.62 | None | Finland |
18 | 40.62 | VV | Netherlands |
19 | 39.62 | None | Norway |
20 | 39.62 | FL | USA |
21 | 39.62 | QC | Canada |
22 | 39.62 | WA | USA |
23 | 38.62 | None | Sweden |
24 | 38.62 | BC | Canada |
25 | 37.62 | None | Argentina |
26 | 37.62 | None | Belgium |
27 | 37.62 | None | Denmark |
28 | 37.62 | None | Poland |
29 | 37.62 | None | Spain |
30 | 37.62 | AB | Canada |
31 | 37.62 | AZ | USA |
32 | 37.62 | DF | Brazil |
33 | 37.62 | MA | USA |
34 | 37.62 | MB | Canada |
35 | 37.62 | NS | Canada |
36 | 37.62 | NSW | Australia |
37 | 37.62 | NT | Canada |
38 | 37.62 | NV | USA |
39 | 37.62 | NY | USA |
40 | 37.62 | RJ | Brazil |
41 | 37.62 | RM | Italy |
Subqueries¶
This is a subquery, where we nest multiple SELECT
statements. We
must name the parenthesized subquery, and here we call it Subquery
.
Then in the parent SELECT
statement, we can access all of the
subquery columns via Subquery.*
.
In [20]:
show_results("""SELECT Subquery.* FROM
(SELECT Artist.Name, COUNT(Artist.Name) as ArtistCount, Customer.State, Customer.Country
FROM InvoiceLine
JOIN Invoice ON InvoiceLine.InvoiceId = Invoice.InvoiceId
JOIN Customer ON Invoice.CustomerId = Customer.CustomerId
JOIN Track ON InvoiceLine.TrackId = Track.TrackId
JOIN Album ON Track.TrackId = Album.AlbumId
JOIN Artist ON Artist.ArtistId = Album.ArtistId
GROUP BY Artist.Name, Customer.State, Customer.Country) AS Subquery
""")
Name | ArtistCount | State | Country | |
---|---|---|---|---|
0 | AC/DC | 1 | None | Germany |
1 | AC/DC | 1 | RM | Italy |
2 | Academy of St. Martin in the Fields & Sir Nevi... | 1 | None | United Kingdom |
3 | Academy of St. Martin in the Fields & Sir Nevi... | 1 | SP | Brazil |
4 | Academy of St. Martin in the Fields Chamber En... | 1 | MB | Canada |
5 | Accept | 1 | None | Germany |
6 | Accept | 1 | DF | Brazil |
7 | Accept | 1 | NT | Canada |
8 | Adrian Leaper & Doreen de Feis | 1 | NV | USA |
9 | Aerosmith | 1 | RM | Italy |
10 | Aisha Duo | 1 | SP | Brazil |
11 | Alanis Morissette | 1 | None | Norway |
12 | Amy Winehouse | 1 | None | United Kingdom |
13 | Amy Winehouse | 1 | NV | USA |
14 | Anne-Sophie Mutter, Herbert Von Karajan & Wien... | 1 | MB | Canada |
15 | Antal Doráti & London Symphony Orchestra | 1 | None | France |
16 | Antônio Carlos Jobim | 1 | None | Norway |
17 | Antônio Carlos Jobim | 1 | NT | Canada |
18 | Apocalyptica | 1 | DF | Brazil |
19 | Apocalyptica | 1 | RM | Italy |
20 | Audioslave | 1 | None | Norway |
21 | Audioslave | 1 | SP | Brazil |
22 | BackBeat | 1 | None | Norway |
23 | Battlestar Galactica (Classic) | 1 | None | Sweden |
24 | Berliner Philharmoniker & Hans Rosbaud | 1 | NV | USA |
25 | Berliner Philharmoniker & Herbert Von Karajan | 1 | None | United Kingdom |
26 | Berliner Philharmoniker & Herbert Von Karajan | 1 | SP | Brazil |
27 | Billy Cobham | 1 | RM | Italy |
28 | Black Label Society | 1 | DF | Brazil |
29 | Black Label Society | 1 | NT | Canada |
... | ... | ... | ... | ... |
171 | The Clash | 1 | CA | USA |
172 | The Cult | 1 | CA | USA |
173 | The Doors | 1 | None | Czech Republic |
174 | The Office | 1 | MB | Canada |
175 | The Police | 1 | FL | USA |
176 | The Rolling Stones | 1 | None | Czech Republic |
177 | The Rolling Stones | 1 | None | Sweden |
178 | The Rolling Stones | 1 | FL | USA |
179 | The Rolling Stones | 1 | MA | USA |
180 | The Tea Party | 1 | FL | USA |
181 | The Who | 1 | FL | USA |
182 | Tim Maia | 1 | None | Czech Republic |
183 | Tim Maia | 1 | None | Sweden |
184 | Titãs | 1 | TX | USA |
185 | U2 | 4 | None | France |
186 | U2 | 1 | None | Germany |
187 | U2 | 1 | None | Sweden |
188 | U2 | 2 | RJ | Brazil |
189 | U2 | 2 | TX | USA |
190 | UB40 | 1 | None | Sweden |
191 | UB40 | 1 | TX | USA |
192 | Van Halen | 2 | None | France |
193 | Van Halen | 1 | TX | USA |
194 | Various Artists | 1 | None | Belgium |
195 | Various Artists | 1 | None | France |
196 | Various Artists | 1 | NT | Canada |
197 | Velvet Revolver | 1 | RJ | Brazil |
198 | Vinícius De Moraes | 1 | None | Sweden |
199 | Yehudi Menuhin | 1 | SP | Brazil |
200 | Zeca Pagodinho | 1 | Dublin | Ireland |
201 rows × 4 columns
Total Spending by Artist¶
This uses a subquery to get the sales by track, and then joins that to
the Artists
table so we can see the total sales by artist.
In [21]:
show_results("""SELECT Artist.Name as ArtistName, SUM(TrackSales.Sales) AS ArtistSales FROM (
SELECT Track.TrackId, Track.AlbumId, Track.Name, SUM(InvoiceLine.UnitPrice * InvoiceLine.Quantity) AS Sales
FROM Track JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId
GROUP BY Track.TrackId
) AS TrackSales
JOIN Album on TrackSales.AlbumId = Album.AlbumId
JOIN Artist on Album.ArtistId = Artist.ArtistId
GROUP BY Artist.Name ORDER BY ArtistSales DESC
""")
ArtistName | ArtistSales | |
---|---|---|
0 | Iron Maiden | 138.60 |
1 | U2 | 105.93 |
2 | Metallica | 90.09 |
3 | Led Zeppelin | 86.13 |
4 | Lost | 81.59 |
5 | The Office | 49.75 |
6 | Os Paralamas Do Sucesso | 44.55 |
7 | Deep Purple | 43.56 |
8 | Faith No More | 41.58 |
9 | Eric Clapton | 39.60 |
10 | R.E.M. | 38.61 |
11 | Creedence Clearwater Revival | 36.63 |
12 | Queen | 36.63 |
13 | Battlestar Galactica (Classic) | 35.82 |
14 | Guns N' Roses | 35.64 |
15 | Titãs | 33.66 |
16 | Green Day | 32.67 |
17 | Pearl Jam | 31.68 |
18 | Kiss | 30.69 |
19 | Van Halen | 28.71 |
20 | Various Artists | 28.71 |
21 | Red Hot Chili Peppers | 26.73 |
22 | Chico Buarque | 26.73 |
23 | Heroes | 25.87 |
24 | Lenny Kravitz | 25.74 |
25 | Chico Science & Nação Zumbi | 24.75 |
26 | Battlestar Galactica | 23.88 |
27 | Ozzy Osbourne | 23.76 |
28 | Smashing Pumpkins | 23.76 |
29 | Tim Maia | 23.76 |
... | ... | ... |
135 | Academy of St. Martin in the Fields, John Birc... | 0.99 |
136 | Adrian Leaper & Doreen de Feis | 0.99 |
137 | Antal Doráti & London Symphony Orchestra | 0.99 |
138 | Berliner Philharmoniker & Hans Rosbaud | 0.99 |
139 | Calexico | 0.99 |
140 | Dread Zeppelin | 0.99 |
141 | Edo de Waart & San Francisco Symphony | 0.99 |
142 | Emanuel Ax, Eugene Ormandy & Philadelphia Orch... | 0.99 |
143 | English Concert & Trevor Pinnock | 0.99 |
144 | Felix Schmidt, London Symphony Orchestra & Raf... | 0.99 |
145 | Göteborgs Symfoniker & Neeme Järvi | 0.99 |
146 | Habib Koité and Bamada | 0.99 |
147 | Julian Bream | 0.99 |
148 | Kent Nagano and Orchestre de l'Opéra de Lyon | 0.99 |
149 | Luciana Souza/Romero Lubambo | 0.99 |
150 | Martin Roscoe | 0.99 |
151 | Maurizio Pollini | 0.99 |
152 | Michele Campanella | 0.99 |
153 | Orchestra of The Age of Enlightenment | 0.99 |
154 | Otto Klemperer & Philharmonia Orchestra | 0.99 |
155 | Philharmonia Orchestra & Sir Neville Marriner | 0.99 |
156 | Roger Norrington, London Classical Players | 0.99 |
157 | Royal Philharmonic Orchestra & Sir Thomas Beecham | 0.99 |
158 | Scholars Baroque Ensemble | 0.99 |
159 | Sir Georg Solti & Wiener Philharmoniker | 0.99 |
160 | The 12 Cellists of The Berlin Philharmonic | 0.99 |
161 | The King's Singers | 0.99 |
162 | The Posies | 0.99 |
163 | Yehudi Menuhin | 0.99 |
164 | Yo-Yo Ma | 0.99 |
165 rows × 2 columns
Searching Databases¶
Using the keyword LIKE
, we can search for strings that are partial
matches. Using the percent sign %
is kind of like using an asterisk
*
in the terminal, it just fills in one or more other characters.
In [22]:
show_results("""SELECT * FROM Artist WHERE Name LIKE (?)""", params=('%foo%',))
show_results("""SELECT * FROM Artist WHERE Name LIKE (?)""", params=('bar%',))
ArtistId | Name | |
---|---|---|
0 | 84 | Foo Fighters |
ArtistId | Name | |
---|---|---|
0 | 48 | Barão Vermelho |
1 | 224 | Barry Wordsworth & BBC Concert Orchestra |
Scientific databases¶
Acquiring quality data with the information you might need for particular research questions is still very hard. Many databases found online are of low quality, or small breadth / depth. Here are some databases of various kinds of materials and bioactive molecules. In many cases, domain scientists/engineers and data scientists must collaborate to make the most out of limited information.
Hosting a production database¶
If reliability is important and/or you have some money and want to get going quickly, use a cloud provider like Amazon Web Services or Google Cloud. - https://aws.amazon.com/products/databases/ - https://cloud.google.com/sql/
Exploring more on your own¶
To learn how databases work on your own, install the mysql-server
package on a server you have access to, or follow instructions for a
LAMP (Linux, Apache, MySQL, PHP) server. The second link below has
demonstrations for all kinds of other databases I mentioned previously,
including several NoSQL databases. -
https://www.linode.com/docs/web-servers/lamp/install-lamp-stack-on-ubuntu-18-04/
- https://www.linode.com/docs/databases/
Administrative interfaces¶
To administer MySQL databases, I really like PHPMyAdmin. It’s how I learned to use databases, because its web interface shows the SQL commands when you click on the buttons. - https://www.phpmyadmin.net/
Sources¶
Database reference material: - https://en.wikipedia.org/wiki/NoSQL - https://docs.python.org/3.7/library/sqlite3.html - https://community.modeanalytics.com/sql/tutorial/sql-joins/ - http://joins.spathon.com/ - https://en.wikipedia.org/wiki/Database_normalization - http://www.sqlitetutorial.net/sqlite-sample-database/
Other Python packages used: - https://networkx.github.io/documentation/stable/ - https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.read_sql.html
Raw data: - https://en.wikipedia.org/wiki/List_of_Marvel_Cinematic_Universe_film_actors - https://github.com/lerocha/chinook-database