A Simple Convolutional Model Explained

What is SimpleCNN?

This is a project I made while following my first tutorial on convolutional neural networks. Practicing by building something always helps me more than just memorizing info. That’s why I did this: to learn through doing.

Dataset

I’m using the CIFAR10 dataset. It’s made up of tiny 32x32 pixel images, divided into 10 classes. You see it everywhere when you start object classification.

To get started, import these libraries.

We’re using PyTorch and torchvision for the deep learning part, plus matplotlib for looking at pictures:

import torch
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
import torchvision

Now let’s actually load the dataset splitting it into training_data and test_data:

training_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, download=True)

root: where to store/download data
train: True for training, False for test set
download: True to download if you don’t have it yet

Want to see what the samples look like? Here’s how:

image, label = training_data[0]
plt.imshow(image)
plt.xlabel(f"label: {label}")

Data Transformation

PyTorch doesn’t use images directly; it works with tensors. To convert images to tensors:

from torchvision.transforms import ToTensor

training_data = torchvision.datasets.CIFAR10(root='./data', train=True, transform=ToTensor())
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, transform=ToTensor())

DataLoader (Batches)

You don’t want to feed one image at a time. That’s slow. Instead, you group images into batches:

from torch.utils.data import DataLoader

train_loader = DataLoader(training_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=64, shuffle=True)

batch_size: how many images per group shuffle: randomizes order each epoch, helps with generalization

You can inspect the first batch:

for images, labels in train_loader:
    print(f"image.shape: {images.shape}")
    print(f"label.shape: {labels.shape}")
    break

Device Setup

To run the model faster, it checks if there’s a GPU and uses it if possible:

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Model Definition

Now, the actual CNN.

This is a super basic one—a couple of conv layers, pooling, then fully connected layers:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
        self.fully_connected1 = nn.Linear(16 * 5 * 5, 120)
        self.fully_connected2 = nn.Linear(120, 84)
        self.fully_connected3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten except for batch
        x = F.relu(self.fully_connected1(x))
        x = F.relu(self.fully_connected2(x))
        x = self.fully_connected3(x)
        return x

net = Net()
print(net)

Shape math note: The numbers in the fully connected layer (16 * 5 * 5) come from shrinking the image down with all the convs and pooling.

Set Up Loss and Optimizer

We use cross-entropy loss (good for classification), and plain-old SGD as optimizer.

import torch.optim as optim

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Training Loop

This is where the model learns.

Twenty epochs, meaning it goes over the training data 20 times:

for epoch in range(20):
    running_loss = 0.0
    for i, data in enumerate(train_loader, start=0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = loss_fn(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print('loss: ', running_loss)
print("Finished training!")

optimizer.zero_grad() clears gradients from the last step Forward pass → outputs Compute loss, backpropagate, optimizer step

You see the loss drop each time—if not, something is off.

Testing and Accuracy

Once it’s trained, check the accuracy:

correct = 0
total = 0

with torch.inference_mode():
    for data in test_loader:
        inputs, labels = data
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print("The total accuracy of the network over the 10,000 images: ", correct/total * 100)

Results

You’ll probably see something like 45-55% accuracy—not terrible, but also not state-of-the-art (and honestly, that’s normal for a basic CNN on CIFAR10).

Wrap-up / What I Learned

Making this was less about making a great model and more about actually running things end-to-end. Tbh, writing this was more boring than coding it, but now at least I can share it or look back without forgetting what I did.

Things I learned:

Transformations and batch handling matter a lot.
It is possible to process images iwth surprisingly little code.
Loss going down is always satisfying :)

What I’d do next time:

Use a more complex model like Resnet.

Repo Link

If you came here from my repo, hey! If not, here’s the code on GitHub.

#Machine-Learning #Python