World Cup Database Project¶
This project is part of the FreeCodeCamp Relational Database Certification course. It demonstrates the creation and manipulation of a PostgreSQL database containing World Cup tournament data from 2014 and 2018.
📋 Project Overview¶
The World Cup Database project involves:
- Creating a PostgreSQL database to store World Cup tournament data
- Implementing database schema with proper relationships
- Importing data from CSV files using shell scripts
- Writing complex SQL queries to extract meaningful insights
🗄️ Database Schema¶
The database consists of two main tables:
Database Schema
- team_id: Primary key, auto-incrementing integer
- name: Unique team name (e.g., 'France', 'Croatia')
CREATE TABLE public.games (
game_id integer NOT NULL,
winner_id integer NOT NULL,
opponent_id integer NOT NULL,
winner_goals integer NOT NULL,
opponent_goals integer NOT NULL,
year integer NOT NULL,
round character varying(255) NOT NULL
);
- game_id: Primary key, auto-incrementing integer
- winner_id: Foreign key referencing teams.team_id
- opponent_id: Foreign key referencing teams.team_id
- winner_goals: Number of goals scored by the winning team
- opponent_goals: Number of goals scored by the opponent team
- year: Tournament year (2014 or 2018)
- round: Tournament round (Final, Semi-Final, Quarter-Final, etc.)
Relationships
games.winner_id
→teams.team_id
games.opponent_id
→teams.team_id
📁 Project Structure¶
fcc-rdb-worldcupdb/
├── insert_data.sh # Script to populate the database
├── queries.sh # Script with predefined queries
├── worldcup.sql # Database dump file
├── games.csv # Source data file
└── README.md # This documentation
🚀 Setup Instructions¶
- Prerequisites
- PostgreSQL installed and running
- Bash shell environment
-
CSV data file (games.csv)
-
Database Setup
- Data Import
🔧 Scripts Description¶
-
insert_data.sh
This script processes the CSV file and populates the database:
Key Features:
- Handles both test and production environments
- Truncates existing data before insertion
- Reads CSV line by line using IFS (Internal Field Separator)
- Checks for existing teams before insertion
- Maintains referential integrity
Logic Flow:
- Truncate both tables to start fresh
- For each CSV row:
- Check if winner team exists, insert if not
- Check if opponent team exists, insert if not
- Insert game record with team IDs
insert_data.sh
#! /bin/bash if [[ $1 == "test" ]] then PSQL="psql --username=postgres --dbname=worldcuptest -t --no-align -c" else PSQL="psql --username=freecodecamp --dbname=worldcup -t --no-align -c" fi # Do not change code above this line. Use the PSQL variable above to query your database. echo $($PSQL "TRUNCATE TABLE games, teams") cat games.csv | while IFS="," read YR RND WIN OPP W_GOAL O_GOAL do # echo $YR, $RND, $WIN TEAMW=$($PSQL "SELECT team_id FROM teams WHERE name='$WIN'") if [[ $WIN != "winner" ]] then if [[ -z $TEAMW ]] then IN_W_ID=$($PSQL "INSERT INTO teams(name) VALUES('$WIN')") if [[ $IN_W_ID == "INSERT 0 1" ]] then echo Inserted into winner, $WIN fi fi fi TEAML=$($PSQL "SELECT team_id FROM teams WHERE name='$OPP'") if [[ $OPP != "opponent" ]] then if [[ -z $TEAML ]] then IN_O_ID=$($PSQL "INSERT INTO teams(name) VALUES('$OPP')") if [[ $IN_W_ID == "INSERT 0 1" ]] then echo Inserted into opponent, $OPP fi fi fi TEAM_ID_W=$($PSQL "SELECT team_id FROM teams WHERE name='$WIN'") TEAM_ID_L=$($PSQL "SELECT team_id FROM teams WHERE name='$OPP'") if [[ -n $EAM_ID_W || -n $TEAM_ID_L ]] then if [[ $YR != "year" ]] then IN_GAME=$($PSQL "INSERT INTO games(winner_id, opponent_id, winner_goals, opponent_goals, year, round) VALUES('$TEAM_ID_W','$TEAM_ID_L','$W_GOAL','$O_GOAL','$YR','$RND')") if [[ $IN_GAME == "INSERT 0 1" ]] then echo Insert game history, $YR fi fi fi done
-
queries.sh
Contains predefined queries demonstrating various SQL operations:
Query Examples:
- Aggregate functions (SUM, AVG, COUNT, MAX)
- JOIN operations between tables
- Filtering with WHERE clauses
- String pattern matching with LIKE
- Data formatting and type casting
📊 queries.sh Tasks and SQL Execution¶
The queries.sh
script contains 12 specific database queries required by the FreeCodeCamp assignment. Each query demonstrates different SQL concepts and returns specific tournament statistics.
-
Execution Instructions Run the queries script:
-
Query Tasks and Expected Results
Task: Calculate the sum of all goals scored by winning teams
Expected Output:68
Task: Calculate the sum of all goals scored by both teams combined
Expected Output:90
Task: Calculate the average number of goals scored by winning teams
Expected Output:2.1250000000000000
Task: Same as above but rounded to 2 decimal places
Expected Output:2.13
Task: Calculate the average total goals per game from both teams
Expected Output:2.8125000000000000
Task: Find the maximum goals scored by one team in a single game
Expected Output:7
Task: Count games where the winning team scored more than 2 goals
Expected Output:6
Task: Get the name of the team that won the 2018 World Cup
Expected Output:SELECT name FROM teams INNER JOIN games ON teams.team_id = games.winner_id WHERE round = 'Final' AND year = 2018;
France
Task: List all teams that played in the 2014 Eighth-Final round
Expected Output:Task: List all unique team names that won at least one game
Expected Output: All teams that appear as winners (16 teams) -
SQL Concepts Demonstrated
-
Aggregate Functions
- SUM(): Total calculations across multiple records
- AVG(): Mean values with optional formatting
- COUNT(): Record counting with conditions
- MAX(): Finding maximum values
-
JOIN Operations
- INNER JOIN: Combining related tables
- Multiple JOIN conditions: Using OR for winner/opponent relationships
- Table aliases: Simplifying complex queries
-
Data Filtering
- WHERE clauses: Conditional filtering
- Comparison operators:
>
,=
,OR
- Pattern matching:
LIKE
with wildcards
-
Data Formatting
- Type casting:
::NUMERIC(10,2)
for decimal precision - DISTINCT: Removing duplicate results
- ORDER BY: Sorting results alphabetically/numerically
- Type casting:
-
Advanced Techniques
- Subqueries: Implicit through JOIN operations
- Conditional logic: Multiple WHERE conditions
- String operations: Pattern matching with LIKE
-
-
Testing Your Implementation
To verify your queries work correctly:
-
Run the script:
-
Compare outputs: Match your results with the expected outputs above
-
Debug common issues:
- Ensure database is properly populated
- Check for typos in SQL syntax
- Verify table relationships are correct
-
-
Performance Notes
- Queries use indexed columns (primary/foreign keys) for optimal performance
- JOIN operations are efficient due to proper relationship design
- DISTINCT operations may be slower on larger datasets
🏆 Data Coverage¶
The database contains:
- Years: 2014 and 2018 World Cup tournaments
- Teams: 24 unique national teams
- Games: 32 tournament matches
- Rounds: Final, Third Place, Semi-Final, Quarter-Final, Eighth-Final
Teams Included
France, Croatia, Belgium, England, Russia, Sweden, Brazil, Uruguay, Colombia, Switzerland, Japan, Mexico, Denmark, Spain, Portugal, Argentina, Germany, Netherlands, Costa Rica, Chile, Nigeria, Algeria, Greece, United States
🎯 Learning Objectives¶
This project demonstrates proficiency in:
-
Database Design
- Creating normalized table structures
- Implementing foreign key relationships
- Setting up proper constraints
-
Data Import/ETL
- Processing CSV files with shell scripts
- Handling data validation and deduplication
- Maintaining referential integrity during import
-
SQL Querying
- Writing complex JOIN queries
- Using aggregate functions
- Implementing conditional logic
- String manipulation and pattern matching
-
Shell Scripting
- File processing and data manipulation
- PostgreSQL integration
- Environment-specific configurations
🔍 Key Technical Concepts¶
- Normalization: Separating teams and games into related tables
- Foreign Keys: Ensuring data integrity across tables
- PSQL Integration: Using command-line PostgreSQL operations
- CSV Processing: Reading and parsing structured data files
- Conditional Logic: Implementing data validation in shell scripts
📈 Potential Extensions¶
Future enhancements could include:
- Additional tournament years
- Player statistics and lineups
- Match locations and venues
- Tournament bracket visualization
- Performance analytics and trends
🏅 Course Context¶
This project is part of the FreeCodeCamp Relational Database Certification, specifically the "Build a World Cup Database" project. It serves as a practical application of database concepts including:
- Database creation and management
- Table relationships and foreign keys
- Data import and validation
- Complex SQL query writing
- Shell script automation
The project demonstrates real-world database skills applicable to sports analytics, data management, and backend development roles.