Multi-Label Classification

Mastering Multi-Label Classification in Python with Scikit-Learn⌗
Welcome back! Today, we’re diving into multi-label classification.
In this post, I’ll guide you through setting up a multi-label classification pipeline using Scikit-Learn. We’ll build a synthetic dataset, train a classifier, and evaluate its performance with metrics tailored to multi-label tasks.
What is Multi-Label Classification?⌗
Unlike traditional classification problems where each instance belongs to a single category, multi-label classification allows each instance to belong to multiple categories simultaneously. For example:
- Email classification: Emails can be tagged as both “Work” and “Important.”
- Image tagging: An image might be tagged with “Beach,” “Sunset,” and “Vacation.”
Multi-label classification requires unique metrics to evaluate the models’ performance.
Step-by-Step Guide⌗
1. Generating a Synthetic Multi-Label Dataset⌗
To demonstrate multi-label classification, we’ll use Scikit-Learn’s make_multilabel_classification
function. This generates a synthetic dataset where each instance can belong to multiple labels.
from sklearn.datasets import make_multilabel_classification
# Generate synthetic multi-label data
X, Y = make_multilabel_classification(
n_samples=1000, # Number of instances
n_features=20, # Number of features
n_classes=5, # Number of labels
n_labels=2, # Average number of labels per instance
random_state=42
)
print("Feature matrix shape:", X.shape) # (1000, 20)
print("Label matrix shape:", Y.shape) # (1000, 5)
2. Splitting the Dataset⌗
We’ll split the dataset into training and testing subsets to evaluate our model’s performance.
from sklearn.model_selection import train_test_split
# Split the dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)
3. Training a Multi-Label Classifier⌗
We’ll use a RandomForestClassifier
, which inherently supports multi-label classification by treating each label as a separate binary classification problem. In fact, most Scikit-Learn classifiers support multi-label classification.
from sklearn.ensemble import RandomForestClassifier
# Train a RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, Y_train)
4. Making Predictions⌗
Once trained, we’ll use the model to predict labels for the test data.
# Predict labels for the test set
Y_pred = model.predict(X_test)
5. Evaluating the Model⌗
Evaluating multi-label models requires specialized metrics. These are pretty intuitive; the metric(s) you use to inform you decisions will depend on your business problem. Ultimately you’ll need to put on your business hat, or talk to a subject matter expert to determine which metrics are appropriate. Here are a few we’ll use:
- Hamming Loss: Measures the fraction of incorrectly predicted labels.
- Jaccard Score: Measures label set similarity between predictions and true values.
- Exact Match Accuracy: Evaluates the percentage of instances where the predicted label set matches the true label set exactly.
- Classification Report: Provides detailed precision, recall, and F1 scores for each label.
from sklearn.metrics import classification_report, hamming_loss, jaccard_score, accuracy_score
# Hamming Loss: Lower is better
print("Hamming Loss:", hamming_loss(Y_test, Y_pred))
# Jaccard Score: Higher is better
print("Jaccard Score (samples average):", jaccard_score(Y_test, Y_pred, average='samples'))
# Exact Match Accuracy: Strict metric
exact_match_accuracy = accuracy_score(Y_test, Y_pred)
print("Exact Match Accuracy:", exact_match_accuracy)
# Detailed Classification Report
print("\nClassification Report:")
print(classification_report(Y_test, Y_pred, zero_division=0))
6. Sample Output⌗
Here’s what the output might look like for a well-performing model (note this is not an exact match of what the above code would output, but it is close):
Hamming Loss: 0.03
Jaccard Score (samples average): 0.85
Exact Match Accuracy: 0.78
Classification Report:
precision recall f1-score support
0 0.90 0.85 0.87 300
1 0.82 0.88 0.85 310
2 0.86 0.80 0.83 290
3 0.88 0.91 0.89 320
4 0.85 0.86 0.85 280
micro avg 0.86 0.86 0.86 1500
macro avg 0.86 0.86 0.86 1500
weighted avg 0.86 0.86 0.86 1500
samples avg 0.85 0.85 0.85 1500
Key Takeaways⌗
- Scikit-Learn’s
make_multilabel_classification
makes it easy to create synthetic datasets for testing. - The right metrics, like Hamming Loss and Jaccard Score, are essential for understanding performance in multi-label settings, but ultimately the choice of metric(s) depends on the business problem.
- Many Scikit-Learn classifiers, such as
RandomForestClassifier
, natively support multi-label learning.