Project as part of the Data Warehousing subject.
Case: Build a powerful and user-friendly Python GUI application for real-world data analysis and machine learning. The project processes two distinct datasets: the UCI Adult Income dataset and the UCI Chronic Kidney Disease dataset, offering users a rich environment for data exploration, cleaning, transformation, and predictive modeling.
Tech Stack:
Python
,PySimpleGUI
,Pandas
,Scikit-learn
,Matplotlib
,Seaborn
.
git clone https://github.com/dawidolko/DataFusion-App-Python.git
cd DataFusion-App-Python
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r src/requirements.txt
python src/main.py
Interactive GUI:
A simple, intuitive interface built with PySimpleGUI allowing users to perform complex data operations without coding.
Data Extraction and Transformation:
Load datasets, clean missing data, normalize, encode categorical variables, and perform feature engineering.
Statistical Analysis:
Calculate key metrics (mean, median, mode, standard deviation), visualize distributions, and explore feature correlations.
Visualization:
Generate histograms, scatter plots, and heatmaps for in-depth data insights.
Modular Architecture:
Easy to maintain and extend, with each feature separated into its own module.
UCI Adult Income Dataset:
Demographic and employment data for income classification tasks.
UCI Chronic Kidney Disease Dataset:
Medical parameters for diagnosing chronic kidney disease (binary classification).
DataFusion-App-Python/
├── database/ # Raw datasets (Adult and Chronic)
├── docs/ # Additional project documentation
│ └── description.docx
├── src/ # Application source code
│ ├── main.py # GUI entry point
│ └── requirements.txt # Python dependencies
├── LICENSE # License file
└── README.md # Project documentation
The DataFusion App Python project is licensed under the MIT License.
Created by Dawid Olko