cnn-evolution

CNN Evolution

LeNet - 1998

Convolution

Average Pooling(kernel_size=2, stride=2)

Dense

Dimensionen werden kleiner

Feature maps mehr

AlexNet - 2012

Convolution(stride)

Max Pooling

tanh -> ReLU

Padding

nn.Conv2d(kernel_size=2, stride=1, padding=1)

hält Dimensionen konstant

VGGNet, 2014

Blocks

$3 \times 3$ Convolution

`kernel_size--`

nn.Conv2d(kernel_size=5, stride=1, padding=0)

$5\times5 \rightarrow Conv_{5\times5} ⟹ 1\times1$

$5 \times 5 = 25$ Parameter

nn.Conv2d(kernel_size=3, stride=1, padding=0)
nn.Conv2d(kernel_size=3, stride=1, padding=0)

$5\times5 \rightarrow Conv_{3\times3} ⟹ 3\times3$

$3\times3 \rightarrow Conv_{3\times3} ⟹ 1\times1$

$3 \times 3 + 3 \times 3 = 18$ Parameter

NiN, 2014

NiN Block

1x1 Convs fungieren als ANN

Fully Convoluted

nin_block(num_classes, kernel_size=3, strides=1, padding=1),
nn.AdaptiveAvgPool2d((1, 1)) # output 1x1 je feature map

`GoogLeNet, 2014`

welche Convolution? ja!

Inception Block

alle Channel werden concatenated

ResNet, 2015

Scaling Problem

Vanishing/Exploding gradients: Gradients -> 0/∞; ✅ ReLU; ✅ BatchNorm

Deep Learning

When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly.
Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error.

Residual Connections

$f = a + b$
$\frac{\partial f}{\partial a} = 1$ local gradient $\implies \frac{\partial L}{\partial a} = \frac{\partial L}{\partial f} \cdot \frac{\partial f}{\partial a} = 42 \cdot 1 = 42$ global gradient
$\implies \frac{\partial L}{\partial b} = 42$
$\implies + $ verteilt den upstream gradient

‐‐‐‐‐‐‐‐‐‐‐

1x1 Conv falls Channels nicht passen

Effekt erst tiefen Netzwerken

The connectome of an insect brain, 2023

Vision Transformer, 2021

[class] enthält generelle Daten

Transfer Learning

def entries(todo_list, date) do
  todo_list.entries
  |> Map.values()
  |> Enum.filter(fn entry -> entry.date == date end)
end

Vorhandenes Wissen neu anwenden

Implementierung?

Untere Schichten ähnlich

model = resnet34()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  ...
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Sequential([...])

optimizer = Adam(model.fc.parameters(), lr=1e-3)