A Simple Convolutional Model Explained
What is SimpleCNN?
This is a project I made while following my first tutorial on convolutional neural networks. Practicing by building something always helps me more than just memorizing info. That’s why I did this: to learn through doing.
Dataset
I’m using the CIFAR10 dataset. It’s made up of tiny 32x32 pixel images, divided into 10 classes. You see it everywhere when you start object classification.
To get started, import these libraries.
We’re using PyTorch and torchvision for the deep learning part, plus matplotlib for looking at pictures:
import torch
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
import torchvision
Now let’s actually load the dataset splitting it into training_data and test_data:
training_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, download=True)
root
: where to store/download data
train
: True
for training, False
for test set
download
: True
to download if you don’t have it yet
Want to see what the samples look like? Here’s how:
image, label = training_data[0]
plt.imshow(image)
plt.xlabel(f"label: {label}")
Data Transformation
PyTorch doesn’t use images directly; it works with tensors. To convert images to tensors:
from torchvision.transforms import ToTensor
training_data = torchvision.datasets.CIFAR10(root='./data', train=True, transform=ToTensor())
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, transform=ToTensor())
DataLoader (Batches)
You don’t want to feed one image at a time. That’s slow. Instead, you group images into batches:
from torch.utils.data import DataLoader
train_loader = DataLoader(training_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=True)
batch_size
: how many images per group
shuffle
: randomizes order each epoch, helps with generalization
You can inspect the first batch:
for images, labels in train_loader:
print(f"image.shape: {images.shape}")
print(f"label.shape: {labels.shape}")
break
Device Setup
To run the model faster, it checks if there’s a GPU and uses it if possible:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
Model Definition
Now, the actual CNN.
This is a super basic one—a couple of conv layers, pooling, then fully connected layers:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
self.fully_connected1 = nn.Linear(16 * 5 * 5, 120)
self.fully_connected2 = nn.Linear(120, 84)
self.fully_connected3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten except for batch
x = F.relu(self.fully_connected1(x))
x = F.relu(self.fully_connected2(x))
x = self.fully_connected3(x)
return x
net = Net()
print(net)
Shape math note: The numbers in the fully connected layer (16 * 5 * 5) come from shrinking the image down with all the convs and pooling.
Set Up Loss and Optimizer
We use cross-entropy loss (good for classification), and plain-old SGD as optimizer.
import torch.optim as optim
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Training Loop
This is where the model learns.
Twenty epochs, meaning it goes over the training data 20 times:
for epoch in range(20):
running_loss = 0.0
for i, data in enumerate(train_loader, start=0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = loss_fn(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
print('loss: ', running_loss)
print("Finished training!")
optimizer.zero_grad()
clears gradients from the last step
Forward pass → outputs
Compute loss, backpropagate, optimizer step
You see the loss
drop each time—if not, something is off.
Testing and Accuracy
Once it’s trained, check the accuracy:
correct = 0
total = 0
with torch.inference_mode():
for data in test_loader:
inputs, labels = data
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print("The total accuracy of the network over the 10,000 images: ", correct/total * 100)
Results
You’ll probably see something like 45-55% accuracy—not terrible, but also not state-of-the-art (and honestly, that’s normal for a basic CNN on CIFAR10).
Wrap-up / What I Learned
Making this was less about making a great model and more about actually running things end-to-end. Tbh, writing this was more boring than coding it, but now at least I can share it or look back without forgetting what I did.
Things I learned:
- Transformations and batch handling matter a lot.
- It is possible to process images iwth surprisingly little code.
- Loss going down is always satisfying :)
What I’d do next time:
- Use a more complex model like Resnet.
Repo Link
If you came here from my repo, hey! If not, here’s the code on GitHub.