- Jupyter Notebook 100%
| .cache | ||
| .ipython/profile_default | ||
| .local/share/jupyter/runtime | ||
| cache | ||
| data | ||
| .dockerignore | ||
| .gitattributes | ||
| 01_train_som_OLD_corrupt.ipynb | ||
| 02_analyze.ipynb | ||
| 03_train_som_60feat.ipynb | ||
| 04_train_som_julia.ipynb | ||
| docker-compose.yml | ||
| Dockerfile | ||
| project.ipynb | ||
| README.md | ||
| requirements.txt | ||
| SOM.ipynb | ||
| utils.py | ||
Pig Vocalization Analysis with Self-Organizing Maps (SOM)
This project analyzes pig vocalizations using a Self-Organizing Map (SOM) to identify meaningful clusters in the audio data. The system can classify different types of pig sounds (e.g., happy, sad, hungry, in pain) and estimate the age of the animal based on vocal characteristics.
Project Structure
SOM/
├── requirements.txt # Python dependencies
├── utils.py # Audio processing utilities
├── 01_train_som.ipynb # Training notebook for SOM model
├── 01_train_som_updated.ipynb # Updated training notebook using real data
├── 02_analyze.ipynb # Analysis notebook for new samples
├── 02_analyze_updated.ipynb # Updated analysis notebook using real data
├── data/ # Audio files and metadata
│ ├── _index.csv # Metadata for audio files
│ └── *.wav # Audio recordings
├── models/ # Trained SOM models
├── reports/ # Analysis reports
└── visualizations/ # Generated plots and charts
Setup
- Install the required dependencies:
pip install -r requirements.txt
- Place your audio files in the
data/directory - Ensure the
_index.csvfile is present with metadata
Usage
Training Phase
- Run
01_train_som_updated.ipynbto train the SOM model on your audio data - The notebook will:
- Load audio files from the
data/directory - Process and segment vocalizations
- Extract features using FFT and MFCC
- Train the SOM model
- Generate visualizations
- Save the trained model
- Load audio files from the
Analysis Phase
- Run
02_analyze_updated.ipynbto analyze new audio samples - The notebook will:
- Load the trained SOM model
- Process new audio files
- Map samples to SOM clusters
- Generate reports and visualizations
- Export cluster likelihoods
Features
- Audio Processing: Automatically detects and segments vocalizations from silence
- Feature Extraction: Uses FFT and MFCC for audio characterization
- Clustering: Self-Organizing Map for unsupervised clustering
- Age Estimation: Estimates animal age based on dominant frequency
- Visualization: Multiple plots showing cluster distributions
- Reporting: Detailed text and CSV reports
Data Format
The _index.csv file contains metadata for each recording with columns:
filename: Name of the audio filespecies: Type of animalsex: Gender of the animalage: Age categorysound_type: Type of vocalization- Additional metadata fields
Output Files
After running the analysis, you'll find:
models/trained_som.pkl: Trained SOM modeldata/som_index.csv: Cluster index with labelsreports/mapping_results.csv: Direct mapping of samples to clustersreports/cluster_likelihood_report.csv: Probabilities for each clusterreports/cluster_summary_report.csv: Summary statisticsreports/analysis_report.txt: Comprehensive text reportvisualizations/*.png: Charts and plots
Customization
You can customize the analysis by modifying:
- SOM grid size (X and Y dimensions)
- Feature extraction parameters
- Cluster labeling heuristics
- Audio processing thresholds
Notes
- The system works with various audio formats (WAV, MP3, FLAC, etc.)
- Sampling rate is standardized to 22050 Hz
- The age estimation is based on frequency analysis (higher pitch = younger animal)
- Cluster labels are preliminary and should be validated by domain experts
Docker Setup
To avoid installation issues and ensure compatibility, you can run this project using Docker:
-
Prerequisites: Install Docker and Docker Compose
-
Build and run the container:
docker-compose up --build -
Access Jupyter Lab: Open your browser and navigate to
http://localhost:8888 -
Working with the notebooks: The notebooks will be accessible through the Jupyter interface at
http://localhost:8888
The Docker setup includes all necessary dependencies and creates a consistent environment that avoids the installation issues experienced with pandas and other packages.
To run commands in the container directly:
# Access the container
docker-compose exec som-analysis bash
# Run a specific notebook
docker-compose exec som-analysis python -m ipykernel notebook 01_train_som_updated.ipynb