cnn-evolution

CNN Evolution


LeNet - 1998

  • Convolution
  • Average Pooling(kernel_size=2, stride=2)
  • Dense

  • Dimensionen werden kleiner
  • Feature maps mehr

AlexNet - 2012

  • Convolution(stride)
  • Max Pooling
  • tanh -> ReLU

Padding

nn.Conv2d(kernel_size=2, stride=1, padding=1)

hält Dimensionen konstant


VGGNet, 2014

  • Blocks
  • $3 \times 3$ Convolution

kernel_size--

nn.Conv2d(kernel_size=5, stride=1, padding=0)

$5\times5 \rightarrow Conv_{5\times5} ⟹ 1\times1$

$5 \times 5 = 25$ Parameter

nn.Conv2d(kernel_size=3, stride=1, padding=0)
nn.Conv2d(kernel_size=3, stride=1, padding=0)

$5\times5 \rightarrow Conv_{3\times3} ⟹ 3\times3$

$3\times3 \rightarrow Conv_{3\times3} ⟹ 1\times1$

$3 \times 3 + 3 \times 3 = 18$ Parameter


NiN, 2014


NiN Block

1x1 Convs fungieren als ANN


Fully Convoluted

nin_block(num_classes, kernel_size=3, strides=1, padding=1),
nn.AdaptiveAvgPool2d((1, 1)) # output 1x1 je feature map

GoogLeNet, 2014

welche Convolution? ja!


Inception Block

alle Channel werden concatenated


ResNet, 2015


Scaling Problem

Vanishing/Exploding gradients
Gradients -> 0/∞
✅ ReLU
✅ BatchNorm

Deep Learning

When deeper networks are able to start converging, a degradation problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly.
Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error.


Residual Connections


  • $f = a + b$
  • $\frac{\partial f}{\partial a} = 1$ local gradient $\implies \frac{\partial L}{\partial a} = \frac{\partial L}{\partial f} \cdot \frac{\partial f}{\partial a} = 42 \cdot 1 = 42$ global gradient
    $\implies \frac{\partial L}{\partial b} = 42$
    $\implies + $ verteilt den upstream gradient

‐‐‐‐‐‐‐‐‐‐‐

1x1 Conv falls Channels nicht passen


Effekt erst tiefen Netzwerken


The connectome of an insect brain, 2023


Vision Transformer, 2021

[class] enthält generelle Daten


Transfer Learning

def entries(todo_list, date) do
  todo_list.entries
  |> Map.values()
  |> Enum.filter(fn entry -> entry.date == date end)
end
  • Vorhandenes Wissen neu anwenden
  • Implementierung?

Untere Schichten ähnlich



model = resnet34()
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  ...
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)
for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Sequential([...])

optimizer = Adam(model.fc.parameters(), lr=1e-3)