Comprehensive data platform (Python + pandas/SQLAlchemy, Dockerised SQL Server, T-SQL, PowerShell automation, SQL Server Management Studio, Power BI/DAX) that blends Trade Map trade statistics with World Bank macro indicators to understand how Portuguese ceramic exporters compete worldwide. The project demonstrates end-to-end ownership of data ingestion, dimensional modeling, SQL Server automation under Docker, and Power BI storytelling.
![]() |
![]() |
|---|---|
![]() |
![]() |
DIM_COUNTRY, DIM_PRODUCT, DIM_DATE dimensions with surrogate keys.Q4).CALC_*) containing pre-computed growth, share, tariff, and logistics metrics.etl/ingest_csv.py, sql/10_staging.sql, sql/20_dimensions.sql, sql/30_facts.sql).Power BI/DAX.md, Power BI/Dashboard.pbix) for interactive analysis.Portuguese ceramics compete globally across tiles, sanitaryware, mosaics, and related HS 69xx segments. Market prioritisation requires a consolidated view of:
| Role | Needs | Assets |
|---|---|---|
| Executive board | Country prioritisation, KPI snapshots | Power BI dashboard |
| Market analysts | Deep dives by HS code or macro driver | Fact tables + DAX measures |
| Data engineering | Repeatable ingestion, QA | etl/, sql/ scripts |
| Trade team | Quick data extracts | SQL views and calc tables |
CALC_* tables keep dashboards responsive even with heavy KPI logic.data/.flowchart LR
A[Trade Map / World Bank CSVs] --> B[Python Ingestion - etl/ingest_csv.py]
B --> C[SQL Staging Views - sql/10_staging.sql]
C --> D[Dimensions - sql/20_dimensions.sql]
D --> E[Facts and Snapshots - sql/30_facts.sql]
E --> F[Power BI Model - Power BI/Dashboard.pbix]
CeramicsWorldDB), recreated every load for reproducibility.Power BI/DAX.md).| Provider | Dataset | Table(s) | Notes |
|---|---|---|---|
| Trade Map | Importers historic (HS 69xx) | FACT_IMP |
Annual values 2005-2024 (Q4 proxy). |
| Trade Map | Exporters historic (Portugal) | FACT_EXP_PT, FACT_EXP_PROD_BY_PT |
Destination and HS code exports. |
| Trade Map | Importers by segment (6907/6908/6910) | FACT_IMP_SEGMENT |
Country x product intersection. |
| Trade Map | Construction services (imports/exports) | FACT_IMP_SECTOR, FACT_EXP_SECTOR_BY_PT |
Single quarterly lines (world vs Portugal). |
| Trade Map | 2024 exporter/importer snapshots | CALC_EXP_*, CALC_IMP_* |
KPIs: growth, tariffs, distance, rankings. |
| World Bank | GDP per capita (NY.GDP.PCAP.CD) | FACT_PIB |
USD, annual. |
| World Bank | Urban population (SP.URB.TOTL) | FACT_URBAN |
Absolute population. |
| World Bank | Industry incl. construction growth (NV.IND.TOTL.KD.ZG) | FACT_CONSTRUCTION |
Annual percent growth. |
DIM_COUNTRY: ISO3, continent, region, slug for SEO-friendly routing.DIM_PRODUCT: HS 4-digit, section, chapter labels.DIM_DATE: year/quarter pairs with decade label; annual series write to the Q4 row to keep joins uniform.FACT_EXP_PT, FACT_EXP, FACT_IMP, FACT_IMP_PT, FACT_EXP_PROD_BY_PT, FACT_IMP_PROD, FACT_IMP_SEGMENT).FACT_EXP_SECTOR_BY_PT, FACT_IMP_SECTOR) referencing only DIM_DATE.FACT_PIB, FACT_URBAN, FACT_CONSTRUCTION).Detailed specs live in docs/Modelo_Conceptual.md, docs/Modelo_Logico.md, docs/Modelo_Fisico.md, docs/Modelo_Relacional.md.
CALC_* table uses the relevant dimension key as primary key (id_country or id_product).sql/30_facts.sql drops and recreates them each run to avoid drift.MSSQL_HOST, MSSQL_DB, MSSQL_USER, MSSQL_PASSWORD, DATA_PATH (see docker/.env template).python etl/ingest_csv.py):
data/**.csv.imports_products_csv_trade_map_list_of_imported_products_for_the_selected_product_ceramic_products_xls).sql/10_staging.sql):
staging.ref_country_lookup.staging.ref_hs_product.sql/20_dimensions.sql):
sql/30_facts.sql):
ON DELETE NO ACTION.data/ # Raw Trade Map and World Bank CSVs
docker/ # Compose files / env samples
docs/ # Conceptual, logical, physical, relational models
etl/ # Python ingestion + helpers
img/ # Dashboard screenshots
powerbi/ # Exploratory notebooks, clustering, PCA
Power BI/ # DAX measures + Dashboard.pbix
sql/ # 10_staging, 20_dimensions, 30_facts scripts
Power BI/DAX.md: curated measures (shares, rankings, clustering, outlier detection) standardising logic.Power BI/Dashboard.pbix: enterprise storytelling with:
CALC_* KPIs.docker/.env, set passwords/volumes, and adjust port mappings if needed.cd docker
docker compose up -d
(You can skip this block if you already have an on-prem/Azure SQL Server available.)
pip install -r etl/requirements.txt.env (or docker/.env) with SQL Server credentials.data/.python etl/ingest_csv.pysqlcmd -d CeramicsWorldDB -i sql/10_staging.sql
sqlcmd -d CeramicsWorldDB -i sql/20_dimensions.sql
sqlcmd -d CeramicsWorldDB -i sql/30_facts.sql
Power BI/Dashboard.pbix, refresh connections, publish/report.CeramicsWorldDB (local or Docker-exposed) for ad-hoc queries, data validation, and performance monitoring.sql/ can be executed directly from SSMS if you prefer GUI-based deployments over sqlcmd.sql/30_facts.sql comments for expectations).CALC_* values with Trade Map CSVs (spot-check Portugal and top importers).Q4 row in DIM_DATE; new years require only appending staging data.sql/30_facts.sql whenever new Trade Map 2025+ files arrive to regenerate KPIs.staging.ref_hs_product and re-running ingestion.sql/30_facts.sql.Invoke-Sqlcmd or GitHub Actions.powerbi/.Educational project released under the MIT License.