DataFusion - Python Application

Project as part of the Data Warehousing subject.


Project maintained by dawidolko Hosted on GitHub Pages — Theme by dawidolko

DataFusion-App-Python

πŸš€ Powerful Data Analysis and Machine Learning GUI Application - Build comprehensive data science platforms with Python, PySimpleGUI, and advanced analytics capabilities

πŸ“‹ Description

Welcome to the DataFusion App repository! This user-friendly Python GUI application provides a comprehensive environment for real-world data analysis and machine learning. The application processes two distinct datasets: the UCI Adult Income dataset and the UCI Chronic Kidney Disease dataset, offering users powerful tools for data exploration, cleaning, transformation, statistical analysis, and predictive modeling.

Built with PySimpleGUI for an intuitive interface and leveraging industry-standard libraries like Pandas, Scikit-learn, Matplotlib, and Seaborn, this project demonstrates best practices in data science workflows, GUI development, and modular application architecture. Perfect for learning data analysis, machine learning algorithms, and building interactive data science applications.

πŸ“ Repository Structure


DataFusion-App-Python/
β”œβ”€β”€ πŸ“ database/ # Raw datasets
β”‚ β”œβ”€β”€ πŸ“Š adult.csv # UCI Adult Income Dataset
β”‚ β”œβ”€β”€ πŸ“Š chronic.csv # UCI Chronic Kidney Disease Dataset
β”‚ └── πŸ“– README.md # Dataset documentation
β”œβ”€β”€ πŸ“ docs/ # Project documentation
β”‚ β”œβ”€β”€ πŸ“ description.docx # Detailed project description
β”‚ β”œβ”€β”€ πŸ“š user-guide.pdf # User manual
β”‚ └── πŸ”¬ analysis-report.pdf # Analysis results
β”œβ”€β”€ πŸ“ src/ # Application source code
β”‚ β”œβ”€β”€ 🎯 main.py # GUI entry point and main application
β”‚ β”œβ”€β”€ πŸ“¦ data_handler.py # Data loading and processing
β”‚ β”œβ”€β”€ πŸ“Š visualization.py # Plotting and visualization
β”‚ β”œβ”€β”€ πŸ€– ml_models.py # Machine learning algorithms
β”‚ β”œβ”€β”€ πŸ“ˆ statistics.py # Statistical analysis functions
β”‚ β”œβ”€β”€ 🧹 preprocessing.py # Data cleaning and transformation
β”‚ β”œβ”€β”€ πŸ–ΌοΈ assets/ # Application assets
β”‚ β”‚ └── screen-app.png # Application screenshot
β”‚ └── πŸ“‹ requirements.txt # Python dependencies
β”œβ”€β”€ πŸ“„ LICENSE # MIT License
└── πŸ“– README.md # Project documentation

πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/dawidolko/DataFusion-App-Python.git
cd DataFusion-App-Python

2. Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Linux/macOS:
source venv/bin/activate

# On Windows:
venv\Scripts\activate

3. Install Dependencies

# Install required packages
pip install -r src/requirements.txt

4. Start the Application

# Run the main application
python src/main.py

βš™οΈ System Requirements

Essential Tools:

Development Environment:

Required Python Libraries:

✨ Key Features

πŸ–₯️ Interactive GUI Interface

πŸ“Š Data Extraction and Transformation

πŸ“ˆ Statistical Analysis

πŸ€– Machine Learning Algorithms

Classification Models:

Clustering:

Association Rules:

πŸ“Š Data Visualization

πŸ”§ Modular Architecture

πŸ“š Educational Focus

πŸ› οΈ Technologies Used

πŸ“š Datasets

UCI Adult Income Dataset

Demographic and employment data for income classification tasks:

UCI Chronic Kidney Disease Dataset

Medical parameters for diagnosing chronic kidney disease:

Both datasets are included in the database/ directory with complete documentation.

πŸ“– Usage Guide

1. Loading Data

Launch the application and select β€œLoad Dataset” from the menu. Choose between:

2. Data Exploration

Use the data exploration tools to:

3. Data Preprocessing

Apply preprocessing operations:

4. Statistical Analysis

Generate statistical insights:

5. Machine Learning

Train and evaluate models:

6. Visualization

Create insightful visualizations:

πŸ–ΌοΈ Application Screenshot

DataFusion App Interface

🀝 Contributing

Contributions are highly welcomed! Here’s how you can help:

Feel free to open issues or reach out through GitHub for any questions or suggestions.

πŸ‘¨β€πŸ’» Author

Created by Dawid Olko - Part of the data science and machine learning series.

πŸ“„ License

This project is open source and available under the MIT License.