Data-Warehouse-Power-Bi-Ceramics-World

Ceramics World - Data Warehouse & Power BI

Comprehensive data platform (Python + pandas/SQLAlchemy, Dockerised SQL Server, T-SQL, PowerShell automation, SQL Server Management Studio, Power BI/DAX) that blends Trade Map trade statistics with World Bank macro indicators to understand how Portuguese ceramic exporters compete worldwide. The project demonstrates end-to-end ownership of data ingestion, dimensional modeling, SQL Server automation under Docker, and Power BI storytelling.


Dashboard overview Segment deep dive
Macro correlation Snapshot callouts

1. Executive Summary


2. Index

  1. Executive Summary
  2. Index
  3. Business Context
  4. Objectives
  5. Stakeholders and User Roles
  6. Why This DW Matters
  7. Solution Architecture
  8. Source Data Inventory
  9. Data Modeling Highlights
  10. Snapshot and KPI Strategy
  11. ETL / ELT Pipeline
  12. Repository Layout
  13. Power BI Deliverables
  14. Getting Started
  15. Validation and Testing
  16. Operations and Maintenance
  17. Roadmap
  18. License

3. Business Context

Portuguese ceramics compete globally across tiles, sanitaryware, mosaics, and related HS 69xx segments. Market prioritisation requires a consolidated view of:


4. Objectives

  1. Unify trade and macro data into a single semantic model optimised for Power BI consumption.
  2. Quantify performance of Portugal vs global peers, highlighting growth, share, and concentration.
  3. Enable scenario analysis by surfacing macro correlations and HS-segment breakdowns.
  4. Provide reproducibility: scripted ETL ensures anyone can rebuild the warehouse end-to-end.

5. Stakeholders and User Roles

Role Needs Assets
Executive board Country prioritisation, KPI snapshots Power BI dashboard
Market analysts Deep dives by HS code or macro driver Fact tables + DAX measures
Data engineering Repeatable ingestion, QA etl/, sql/ scripts
Trade team Quick data extracts SQL views and calc tables

6. Why This DW Matters


7. Solution Architecture

flowchart LR
    A[Trade Map / World Bank CSVs] --> B[Python Ingestion - etl/ingest_csv.py]
    B --> C[SQL Staging Views - sql/10_staging.sql]
    C --> D[Dimensions - sql/20_dimensions.sql]
    D --> E[Facts and Snapshots - sql/30_facts.sql]
    E --> F[Power BI Model - Power BI/Dashboard.pbix]

8. Source Data Inventory

Provider Dataset Table(s) Notes
Trade Map Importers historic (HS 69xx) FACT_IMP Annual values 2005-2024 (Q4 proxy).
Trade Map Exporters historic (Portugal) FACT_EXP_PT, FACT_EXP_PROD_BY_PT Destination and HS code exports.
Trade Map Importers by segment (6907/6908/6910) FACT_IMP_SEGMENT Country x product intersection.
Trade Map Construction services (imports/exports) FACT_IMP_SECTOR, FACT_EXP_SECTOR_BY_PT Single quarterly lines (world vs Portugal).
Trade Map 2024 exporter/importer snapshots CALC_EXP_*, CALC_IMP_* KPIs: growth, tariffs, distance, rankings.
World Bank GDP per capita (NY.GDP.PCAP.CD) FACT_PIB USD, annual.
World Bank Urban population (SP.URB.TOTL) FACT_URBAN Absolute population.
World Bank Industry incl. construction growth (NV.IND.TOTL.KD.ZG) FACT_CONSTRUCTION Annual percent growth.

9. Data Modeling Highlights

Detailed specs live in docs/Modelo_Conceptual.md, docs/Modelo_Logico.md, docs/Modelo_Fisico.md, docs/Modelo_Relacional.md.


10. Snapshot and KPI Strategy


11. ETL / ELT Pipeline

  1. Environment config: set MSSQL_HOST, MSSQL_DB, MSSQL_USER, MSSQL_PASSWORD, DATA_PATH (see docker/.env template).
  2. CSV ingestion (python etl/ingest_csv.py):
    • Scans data/**.csv.
    • Generates safe SQL table names (imports_products_csv_trade_map_list_of_imported_products_for_the_selected_product_ceramic_products_xls).
    • Loads raw data into SQL Server staging schema.
  3. Staging views (sql/10_staging.sql):
    • Cleans country names via staging.ref_country_lookup.
    • Normalises HS metadata through staging.ref_hs_product.
    • Unpivots annual/quarterly columns, converts percentages to decimals, enforces ISO3 lookups.
  4. Dimensions (sql/20_dimensions.sql):
    • Rebuilds surrogate keys, ensuring only validated ISO3/HS inputs survive.
  5. Facts and calc tables (sql/30_facts.sql):
    • Loads trade, macro, and service facts.
    • Creates KPI snapshots.
    • Applies referential constraints and ON DELETE NO ACTION.

12. Repository Layout

data/            # Raw Trade Map and World Bank CSVs
docker/          # Compose files / env samples
docs/            # Conceptual, logical, physical, relational models
etl/             # Python ingestion + helpers
img/             # Dashboard screenshots
powerbi/         # Exploratory notebooks, clustering, PCA
Power BI/        # DAX measures + Dashboard.pbix
sql/             # 10_staging, 20_dimensions, 30_facts scripts

13. Power BI Deliverables


14. Getting Started

Virtualised SQL Server with Docker

  1. Copy docker/.env, set passwords/volumes, and adjust port mappings if needed.
  2. Provision the SQL Server + loader helper stack:
    cd docker
    docker compose up -d
    
  3. The compose file virtualises the environment so you can rebuild the DW without installing SQL Server locally. Exposed ports let Power BI, SSMS, or scripts connect as if it were a native instance.

(You can skip this block if you already have an on-prem/Azure SQL Server available.)

Steps

  1. pip install -r etl/requirements.txt
  2. Populate .env (or docker/.env) with SQL Server credentials.
  3. Place Trade Map / World Bank CSVs under data/.
  4. Run ingestion: python etl/ingest_csv.py
  5. Execute SQL scripts in order:
    sqlcmd -d CeramicsWorldDB -i sql/10_staging.sql
    sqlcmd -d CeramicsWorldDB -i sql/20_dimensions.sql
    sqlcmd -d CeramicsWorldDB -i sql/30_facts.sql
    
  6. Open Power BI/Dashboard.pbix, refresh connections, publish/report.

SQL Server Management Studio (SSMS)


15. Validation and Testing


16. Operations and Maintenance


17. Roadmap

  1. Automate SQL deployment via Invoke-Sqlcmd or GitHub Actions.
  2. Add anomaly detection measures (seasonal ARIMA) to powerbi/.
  3. Integrate logistics/tariff APIs for near real-time snapshots.
  4. Publish semantic model to Fabric or Power BI Service as a certified dataset.

18. License

Educational project released under the MIT License.