๐Ÿ“š[๋…ผ๋ฌธ ๋ณด๋Ÿฌ๊ฐ€๊ธฐ](https://arxiv.org/abs/1709.01507) ๐Ÿ“š[์ฝ”๋“œ ๋ณด๋Ÿฌ๊ฐ€๊ธฐ](https://github.com/hujie-frank/SENet) # Abstract - CNN์˜ ๊ธฐ๋ณธ์€ **convolution ์—ฐ์‚ฐ**์ด๋‹ค. - ์ด ์—ฐ์‚ฐ์€ **๊ณต๊ฐ„(spatial) ์ •๋ณด**์™€ **์ฑ„๋„(channel-wise) ์ •๋ณด**๋ฅผ ์ง€**์—ญ์ ์ธ receptive field(kernel ํฌ๊ธฐ)** ์•ˆ์—์„œ ์œตํ•ฉํ•˜์—ฌ ์œ ์šฉํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค. - ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ **๊ณต๊ฐ„์  ํ‘œํ˜„(spatial encondig)์„** ๊ฐœ์„ ํ•˜์—ฌ ๋„คํŠธ์›Œํฌ ํ‘œํ˜„๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. - ํ•˜์ง€๋งŒ ์ด ๋…ผ๋ฌธ์€ **์ฑ„๋„ ๊ฐ„์˜ ๊ด€๊ณ„(channel relationship)์—** ์ง‘์ค‘ํ•œ๋‹ค. - ์ด๋ฅผ ์œ„ํ•ด ์ƒˆ๋กœ์šด ๊ตฌ์กฐ์  ๋‹จ์œ„์ธ **Squeeze-and-Excitation (SE) block**์„ ์ œ์•ˆํ•œ๋‹ค. - SE block์€ **์ฑ„๋„ ๋ณ„ ์ค‘์š”๋„**๋ฅผ ํ•™์Šตํ•ด์„œ ๊ฐ ์ฑ„๋„์„ ์žฌ์กฐ์ •ํ•œ๋‹ค. - SE block์„ ์—ฌ๋Ÿฌ ์ธต์— ์Œ“์œผ๋ฉด **SENet ์•„ํ‚คํ…์ฒ˜**๊ฐ€ ๋˜๊ณ  ์ด ์•„ํ‚คํ…์ฒ˜๋Š” ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ผ๋ฐ˜ํ™”๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์™”๋‹ค. - ๋˜ํ•œ SE block์„ ๊ธฐ์กด์˜ CNN์— ์ถ”๊ฐ€ํ•˜๋ฉด **๊ณ„์‚ฐ๋Ÿ‰์€ ๊ฑฐ์˜ ๋Š˜์ง€ ์•Š๋Š”๋ฐ ์„ฑ๋Šฅ์€ ํฌ๊ฒŒ ์ƒ์Šน**ํ•œ๋‹ค. - ์ด๋ฅผ ํ†ตํ•ด SENet์ด **ILSVRC 2017 ImageNet ๋Œ€ํšŒ**์—์„œ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค. # 1. Introduction ![[Squeeze-and-Excitation block.png]] - CNN์˜ ๊ฐ convolution Layer๋Š” ์ž…๋ ฅ ์ฑ„๋„๋ณ„๋กœ **์ง€์—ญ์  ์—ฐ๊ฒฐ(local spatial connectivity) ํŒจํ„ด**์„ ํ•™์ŠตํžŒ๋‹ค. - ์ฆ‰, ํ•„ํ„ฐ(Filter)๋Š” **๊ณต๊ฐ„์  ๊ตฌ์กฐ + ์ฑ„๋„ ๊ฒฐํ•ฉ ์ •๋ณด**๋ฅผ ์ด์šฉํ•ด ํŠน์ง•์„ ๋งŒ๋“ ๋‹ค. - **Inception** ๊ณ„์—ด(GoogLeNet)์˜ ์—ฐ๊ตฌ์—์„œ๋Š” **์—ฌ๋Ÿฌ ํฌ๊ธฐ์˜ ์ปค๋„(1x1, 3x3, 5x5)์„ ๋ณ‘๋ ฌ**๋กœ ์‚ฌ์šฉํ•ด์„œ ๋‹ค์ค‘ ์Šค์ผ€์ผ ๊ณต๊ฐ„ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ–ˆ๋‹ค. - **Spatial Attention** ์—ฐ๊ตฌ๋“ค์—๋Š” ๋ชจ๋ธ์ด ๊ณต๊ฐ„์ ์œผ๋กœ **'์–ด๋””๋ฅผ ๋ณผ์ง€'๋ฅผ** ํ•™์Šตํ•˜๋„๋ก ๋งŒ๋“ ๋‹ค. - ์ด ๋…ผ๋ฌธ์€ ๊ณต๊ฐ„(spatial)์ด ์•„๋‹ˆ๋ผ **์ฑ„๋„ ๊ฐ„ ๊ด€๊ณ„(channel interdependency)์—** ์ดˆ์ ์„ ๋‘”๋‹ค. - ์ด๋ฅผ ์œ„ํ•ด **Squeeze-and-Excitation (SE) Block**์„ ์ œ์•ˆํ•˜๊ณ  ์ฑ„๋„ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜์—ฌ **๋„คํŠธ์›Œํฌ์˜ ํ‘œํ˜„๋ ฅ(representational power)๋ฅผ** ๊ฐ•ํ™”ํ•œ๋‹ค. - ํ•ต์‹ฌ์ ์œผ๋กœ **์ „์—ญ ์ •๋ณด(global context)๋ฅผ** ์ด์šฉํ•ด์„œ **์œ ์šฉํ•œ ์ฑ„๋„์€ ๊ฐ•์กฐ(emphasise)**, **๋œ ์œ ์šฉํ•œ ์ฑ„๋„์€ ์–ต์ œ(suppress)ํ•œ๋‹ค.** โ†’ ์ด๊ฑธ **feature recalibration** (ํŠน์ง• ์žฌ๋ณด์ •)์ด๋ผ๊ณ  ํ•œ๋‹ค. - ๊ธฐ์กด CNN๋Š” F<sub>tr</sub> : X โ†’ U ๊ณผ์ •์ด ๋์ด์—ˆ์œผ๋‚˜, SE Block์€ U ์ดํ›„ ์ถœ๋ ฅ์„ **Squeeze โ†’ Excitation โ†’ Scale** ์„ธ ๋‹จ๊ณ„๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค. - Squeeze : ๊ณต๊ฐ„ ์ฐจ์›์„ **average pooling์œผ๋กœ ์••์ถ•**ํ•ด, ๊ฐ ์ฑ„๋„์˜ ์ „์—ญ์  ํ™œ์„ฑ๊ฐ’์„ ์–ป๋Š”๋‹ค. - Excitation : ๊ทธ ๋ฒกํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ **์ฑ„๋„๋ณ„ ์ค‘์š”๋„**๋ฅผ ํ•™์Šตํ•œ๋‹ค. - Scale : ์›๋ž˜ feature map์— ์ฑ„๋„๋ณ„ ๊ฒŒ์ดํŠธ๋ฅผ ๊ณฑํ•ด์„œ **๊ฐ•์กฐ/์–ต์ œ**ํ•œ๋‹ค. - SE Block์€ ๋‹จ์ˆœํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด ์•„ํ‚คํ…์ฒ˜์— ๋ฐ”๋กœ **๋ผ์šธ ์ˆ˜ ์žˆ๊ณ (drop in module),** **๊ณ„์‚ฐ๋Ÿ‰๋„ ๊ฑฐ์˜ ๋Š˜์ง€ ์•Š๋Š”๋‹ค(lightwight).** # 2. Related Work ## Deep Architectures - VGGNet : ๋‹จ์ˆœํ•˜์ง€๋งŒ **๊นŠ๊ฒŒ ์Œ“์€ 3x3 conv ๊ตฌ์กฐ**๋กœ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์˜ ํšจ์šฉ์„ ์ž…์ฆ - Inception : ์—ฌ๋Ÿฌ ํฌ๊ธฐ์˜ ์ปค๋„์„ ๋ณ‘๋ ฌ๋กœ ์‚ฌ์šฉํ•ด **multi-scale spatial feature** ํ•™์Šต - Batch Normalization(BN) : ์ž…๋ ฅ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•ด ์•ˆ์ •ํ™” - [[Deep Residual Learning for Image Recognition|Resnet]] : skip connection์œผ๋กœ gradient ํ๋ฆ„ ๋ฌธ์ œ ํ•ด๊ฒฐ - Highway Networks : ๊ฒŒ์ดํŠธ(gate)๋กœ shortcut์„ ์ œ์–ดํ•˜๋Š” ์ดˆ์ฐฝ๊ธฐ ์•„์ด๋””์–ด - DenseNet, Dual Path Networks : layer ๊ฐ„ ์—ฐ๊ฒฐ ๋ฐฉ์‹์„ ๋‹ค์‹œ ์„ค๊ณ„ํ•ด์„œ **feature ์žฌํ™œ์šฉ + ํ‘œํ˜„ ๋‹ค์–‘์„ฑ** ๊ฐ•ํ™” - Grouped convolution (ResNeXt) : ์ฑ„๋„์„ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด ์—ฐ์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์ž„ - Multi-branch ๊ตฌ์กฐ(Inception ๊ณ„์—ด) : ์—ฌ๋Ÿฌ ๋ณ€ํ™˜์„ ๋ณ‘๋ ฌ ์กฐํ•ฉํ•ด์„œ ๋‹ค์–‘ํ•œ ํŠน์ง• ์กฐํ•ฉ - ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ **์ฑ„๋„ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜์ง€ ์•Š์•˜๋‹ค.** ## Attention and gating mechanisms - Attention : ์‹ ๊ฒฝ๋ง์ด ์ž…๋ ฅ ์ „์ฒด๋ฅผ ๋˜‘๊ฐ™์ด ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๊ณ , **์ค‘์š”ํ•œ ๋ถ€๋ถ„๋งŒ ๋” ์ง‘์ค‘**ํ•˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ• - Image Captioning : ์ด๋ฏธ์ง€์˜ '์–ด๋””'๋ฅผ ๋ณผ์ง€ ์œ„์น˜์ (spatial) attention ํ•™์Šต - Lip Reading, Visual Localization :์‹œ์ž‘ ์ •๋ณด์˜ ์‹œ๊ฐ„์ /๊ณต๊ฐ„์  ์ •๋ณด์— ์ง‘์ค‘ - Residual Attention Network : ResNet ์•ˆ์— **turnk + mask ๊ตฌ์กฐ**๋กœ **attention map**์„ ์ƒ์„ฑ - SENet์€ attetion์˜ ๊ฐœ๋…์„ **์ฑ„๋„ ์ฐจ์›(channel dimension)์—** ์ ์šฉํ–ˆ๋‹ค. # 3. Squeeze-and-Excitation Blocks - CNN์˜ Convolution์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค. - ![[Transformation Formula.png]] - v<sub>c</sub> : c๋ฒˆ์งธ convolution ํ•„ํ„ฐ - x<sup>s</sup> : ์ž…๋ ฅ์˜ s๋ฒˆ์งธ ์ฑ„๋„ - v<sub>c</sub> * x<sup>s</sup> : ๊ฐ ์ฑ„๋„์˜ convolution ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋‘ ๋” ํ•จ์œผ๋กœ์จ u<sub>c</sub> ํ˜•์„ฑ - ๊ธฐ์กด์˜ convolution์€ ์ฑ„๋„ ๊ฐ„ ๊ด€๊ณ„๋ฅผ **์•”๋ฌต์ **์œผ๋กœ ํฌํ•จํ•˜์ง€๋งŒ ๊ณต๊ฐ„์  ์ƒ๊ด€๊ด€๊ณ„์™€ ์–ฝํ˜€ ์žˆ์–ด์„œ **๋ช…์‹œ์ **์œผ๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค. - ๊ทธ๋ž˜์„œ CNN์ด ์ฑ„๋„ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ชจ๋ธ๋ง ํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ *squeeze*์™€ *excitation* steps์„ ์ถ”๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค. ## 3.1 Squeeze : Global Information Embedding - convolution ํ•„ํ„ฐ๋Š” **local(์ง€์—ญ์ ) receptive field** ์•ˆ์˜ ์ •๋ณด๋งŒ ๋‹ค๋ฃฌ๋‹ค. - ๋•Œ๋ฌธ์— ๋‚ฎ์€ layer์ผ์ˆ˜๋ก ์ „์—ญ ๋งฅ๋ฝ(global context)์„ ๊ณ ๋ คํ•˜๊ธฐ ์–ด๋ ต๋‹ค. - ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด **Squeeze ์—ฐ์‚ฐ**์„ ์‚ฌ์šฉํ•œ๋‹ค. - ![[Squeeze Formula.png]] - ๊ฐ ์ฑ„๋„์˜ feature map u<sub>c</sub>์„ ๊ณต๊ฐ„ ์ฐจ์› ์ „์ฒด(HxW)๋กœ ํ‰๊ท ๋‚ด์–ด **๋‹จ์ผ ์Šค์นผ๋ผ** z<sub>c</sub>๋กœ ๋งŒ๋“ ๋‹ค. - ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ z = [z<sub>1</sub>, z<sub>2</sub>, ... , z<sub>c</sub>]๋Š” ๊ฐ ์ฑ„๋„์˜ **์ „์—ญ ํ™œ์„ฑํ™” ํ†ต๊ณ„(global descriptive statistic)๋ฅผ** ๋‚˜ํƒ€๋‚ธ๋‹ค. ## 3.2 Excitation : Adaptive Recalibration ### Excitation - Squeeze๋กœ ์š”์•ฝํ•œ ์ •๋ณด z๋ฅผ ์ด์šฉํ•ด ๊ฐ ์ฑ„๋„ ๊ฐ„ **์ƒํ˜ธ์˜์กด์„ฑ(interdependency)๋ฅผ** ํ•™์Šตํ•œ๋‹ค. - ![[Excitation Formula.png]] - ฮด : ReLU ํ•จ์ˆ˜ - ฯƒ : sigmoid ํ•จ์ˆ˜ - W<sub>1</sub>โˆˆR<sup>C/rร—C</sup> : ์ฐจ์› ์ถ•์†Œ (reduction) - W<sub>2</sub>โˆˆR<sup>Cร—C/r</sup> : ์ฐจ์› ๋ณต์› (expansion) - r : reduction ratio (์ผ๋ฐ˜์ ์œผ๋กœ 16) - W<sub>1</sub>z๋ฅผ ํ†ตํ•ด ์ฐจ์›์„ ์ถ•์†Œํ•˜์—ฌ ์ฑ„๋„ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ๋” ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•œ๋‹ค. - W<sub>2</sub>๋กœ ์ถ•์†Œํ–ˆ๋˜ ์ฐจ์›์„ ๋ณต์›ํ•œ๋‹ค. - Sigmoid๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [0,1] ๋ฒ”์œ„์˜ ๊ฒŒ์ดํŠธ ๋ฒกํ„ฐ *s*๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. - ์ตœ์ข…์ ์œผ๋กœ ๋‚˜์˜ค๋Š” s = [s<sub>1</sub>, s<sub>2</sub>, ... , s<sub>c</sub>]๋Š” ๊ฐ **์ฑ„๋„์˜ ์ค‘์š”๋„(weight)๋ฅผ** ์˜๋ฏธํ•œ๋‹ค. ### Scale - ์ตœ์ข…์ ์œผ๋กœ Excitation์—์„œ ์–ป์€ ๊ฐ€์ค‘์น˜ *s*๋ฅผ ๊ฐ ์ฑ„๋„์˜ feature map์— ๊ณฑํ•ด์ค€๋‹ค. - ![[Scale Formula.png]] - ์ด ๊ณผ์ •์—์„œ ์ค‘์š”๋„๊ฐ€ ๋†’์€ ์ฑ„๋„์€ ๊ฐ•์กฐ๋˜๊ณ , ๋‚ฎ์€ ์ฑ„๋„์€ ์–ต์ œ๋œ๋‹ค. - SE Block์€ ์ž…๋ ฅ๋งˆ๋‹ค ๋‹ค๋ฅด๊ฒŒ ๋™์ž‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ™์€ ๋„คํŠธ์›Œํฌ๋ผ๋„ **์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ๋‹ฌ๋ผ์ง€์ž๋งŒ ์ฑ„๋„๋ณ„ ๊ฐ•์กฐ ํŒจํ„ด๋„ ๋‹ฌ๋ผ์ง„๋‹ค.** ## 3.3 Exemplars : SE-Inception and SE-ResNet ### SE-Inception - Inception ๊ตฌ์กฐ๋Š” ์—ฌ๋Ÿฌ ๋ณ‘๋ ฌ branch(1x1, 3x3, 5x5 conv ๋“ฑ)๋ฅผ ๊ฐ–๋Š”๋‹ค. - ์ด๋Ÿฌํ•œ branch์˜ ์ถœ๋ ฅ์„ ์ด์—ฌ๋ถ™์—ฌ ์ตœ์ข… feature map U๋ฅผ ๋งŒ๋“ ๋‹ค. ์ด ๊ณผ์ •์„ F<sub>tr</sub>์ด๋ผ๊ณ  ํ•œ๋‹ค. - ![[SE-Inception Module.png]] - Global pooling : Squeeze ๋™์ž‘ - FC โ†’ ReLU โ†’ FE โ†’ Sigmoid : Excitation ๋™์ž‘ - Scale ### SE-ResNet - [[Deep Residual Learning for Image Recognition|ResNet]]์€ skip-connection์„ ์‚ฌ์šฉํ•˜์—ฌ y = F<sub>tr</sub>(x, W) + x ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค. - ์—ฌ๊ธฐ์„œ F<sub>tr</sub>(x, W)๋ฅผ residual branch๋ผ๊ณ  ํ•œ๋‹ค. - **SE-ResNet**์—์„œ๋Š” ์ด residual branch ๋ ๋ถ€๋ถ„์— **SE block**์„ ์‚ฝ์ž…ํžŒ๋‹ค. - ![[SE-ResNet Module.png]] - Global pooling : Squeeze ๋™์ž‘ - FC โ†’ ReLU โ†’ FE โ†’ Sigmoid : Excitation ๋™์ž‘ - Scale # 4. Model and Computational Complexity - ๊ธฐ์กด Model์— SE Block์„ ์ถ”๊ฐ€ํ•˜์˜€์„ ๋•Œ **์—ฐ์‚ฐ๋Ÿ‰(FLOPs)๊ณผ** **Complexity(๋ณต์žก๋„)๋Š”** ์–ผ๋งˆ๋‚˜ ์ฆ๊ฐ€ ํ• ๊นŒ? ## ์—ฐ์‚ฐ๋Ÿ‰(FLOPs)์™€ ํ•™์Šต์‹œ๊ฐ„ - ResNet-50๊ณผ SE-ResNet50์œผ๋กœ ์—ฐ์‚ฐ๋Ÿ‰์„ ๋น„๊ตํ–ˆ์„ ๋•Œ ๊ธฐ๋ณธ ResNet-50์ด ~3.86 GFLOPs์ด๊ณ  SE-ResNet-50์ด ~3.87 GFLIOPs๋กœ ์ƒ๋Œ€์ ์œผ๋กœ 0.26% ์ฆ๊ฐ€ํ–ˆ๋‹ค. - Squeeze๋Š” ๋‹จ์ˆœํ•œ Global Average Pooling์œผ๋กœ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งค์šฐ ์ž‘๋‹ค. - Excitation์€ ์ž‘์€ ๋‘ ๊ฐœ์˜ FC๋กœ **์ฑ„๋„ ์ˆ˜**๋งŒ ๋‹ค๋ฃจ๋Š” ์—ฐ์‚ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ณต๊ฐ„ ์ฐจ์›์ด ์—†์–ด ์ด ๋˜ํ•œ ์—ฐ์‚ฐ์ด ํฌ์ง€ ์•Š๋‹ค. - Scale์€ ์ฑ„๋„ ๋ณ„ ๊ณฑ์…ˆ ํ•œ ๋ฒˆ์ž„์œผ๋กœ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งค์šฐ ์ž‘๋‹ค. - ์ฆ‰, spatial conv(3x3 ๋“ฑ)์— ๋น„ํ•˜๋ฉด SE๋Š” ์ถ”๊ฐ€ํ•˜๋Š” ๊ฑด **๋งค์šฐ ์‹ธ๋‹ค.** - 8ร— NVIDIA Titan X GPU ์„œ๋ฒ„, ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ 256์ผ ๋•Œ์˜ ์‹ค์ œ ํ•™์Šต ์‹œ๊ฐ„์„ ๋น„๊ตํ•ด๋„ ResNet-50์ด 190 ms, SE-ResNet-50์ด 209ms๋กœ(forward+backward ํ•œ step ๊ธฐ์ค€) ์•ฝ **19ms ์ •๋„๋งŒ ์ฆ๊ฐ€ํ–ˆ๋‹ค.** ## ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ์ฆ๊ฐ€๋Ÿ‰ - SE ๋ธ”๋ก์€ ์‚ฌ์‹ค์ƒ Excitation ๋‹จ๊ณ„์˜ 2-layer FC๋ฟ์ด๋‹ค. - ๋”ฐ๋ผ์„œ ์ถ”๊ฐ€ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณต์‹์„ ๋”ฐ๋ผ๊ฐ„๋‹ค. - ![[SE Additional Parameter Formula.png]] - r : reduction ratio - S : stage ์ˆ˜ - N<sub>s</sub> : stage s ์•ˆ์—์„œ ๋™์ผํ•œ ๋ธ”๋ก์ด ๋ช‡ ๋ฒˆ ๋ฐ˜๋ณต๋˜๋Š”์ง€ (block ๋ฐ˜๋ณต ํšŸ์ˆ˜) - C<sub>s</sub> : ๊ทธ stage์˜ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜ - Excitation MLP๋Š” FC(C<sub>s</sub> x C<sub>s</sub> / r)๊ณผ FC( C<sub>s</sub> / r x C<sub>s</sub>)๋กœ ์ด ๋‘˜์„ ๋”ํ•˜๋ฉด (2/r x C<sub>s</sub><sup>2</sup>)๊ฐ€ ๋‚˜์˜จ๋‹ค. - ๊ทธ ๋ธ”๋ก์— ๊ฐ™์€ stage ์•ˆ์—์„œ N<sub>s</sub>๋ฒˆ ๋ฐ˜๋ณต๋จ์œผ๋กœ ์œ„์˜ ์ตœ์ข… ๊ณต์‹์ด ๋‚˜์˜จ๋‹ค. - ResNet-50๊ณผ SE-ResNet-50์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ๋น„๊ตํ•ด๋ณด๋ฉด ResNet-50: ์•ฝ **25M parameters**, SE-ResNet-50: ์•ฝ **27.5M parameters**์œผ๋กœ ๋Œ€๋žต 10% ์ •๋„ ์ฆ๊ฐ€ํ•œ๋‹ค. - ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ด๋ ‡๊ฒŒ ๋งŽ์ด ์ฆ๊ฐ€ํ•˜๋Š” ์ด์œ ๊ฐ€ ๋ฐ”๋กœ **๋งˆ์ง€๋ง‰ stage์˜ ์ฑ„๋„ ์ˆ˜๊ฐ€ ๋งค์šฐ ํฐ๋ฐ** ๊ทธ๋Ÿฌ๋ฉด C<sub>s</sub><sup>2</sup>์ด ์ปค์ง€๊ธฐ ๋•Œ๋ฌธ์— ๊ฑฐ๊ธฐ์„œ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋งค์šฐ ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋œ๋‹ค. - **๋งˆ์ง€๋ง‰ stage ์ชฝ SE ๋ธ”๋ก์„ ๋นผ๋„** ์ •ํ™•๋„ ์†์‹ค์€ top-1 ๊ธฐ์ค€ < 0.1%๋กœ ๋งค์šฐ ์ž‘๊ณ  ํŒŒ๋ผ๋ฏธํ„ฐ ์ฆ๊ฐ€๋Š” 10% โ†’ 4% ์ˆ˜์ค€๊นŒ์ง€ ๋–จ์–ด์ง„๋‹ค. # 5. Implementation - ResNet-50๊ณผ SE-ResNet-50์€ ์™„์ „ํžˆ ๊ฐ™์€ ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จํ–ˆ๋‹ค. ## ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ฐ ์ฆ๊ฐ• 1. Random-size cropping : 299x299 ์ด๋ฏธ์ง€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ž˜๋ผ 224x244๋กœ ๋งŒ๋“ฆ (ํ‰๊ฐ€๋Š” center crop) 2. Random horizontal flipping ์‚ฌ์šฉ 3. Input normalization : ํ‰๊ท  ์ฑ„๋„๊ฐ’์„ ๋นผ์„œ ์ •๊ทœํ™” 4. Bananced sampling strategy : mini-batch๋ฅผ ๊ตฌ์„ฑํ•  ๋•Œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•์„ ์–ด๋А ์ •๋„ ์กฐ์ •ํ•˜๋Š” ์ „๋žตโ†’ ํด๋ž˜์Šค ๋‹ค์–‘์„ฑ์„ ๋ณด์žฅํ•ด์ค€๋‹ค. ## ํ•™์Šต ์ธํ”„๋ผ - ์ž์ฒด ๋ถ„์‚ฐ ํ•™์Šต ์‹œ์Šคํ…œ์€ **ROCS** ์‚ฌ์šฉ - ์ด๋ฅผ ํ†ตํ•ด ๋Œ€๊ทœ๋ชจ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ณ‘๋ ฌ๋กœ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. ## ์ตœ์ ํ™” ์„ค์ • - Optimizer : SGD ์‚ฌ์šฉ, Momentum 0.9 - Mini-batch size : 1024 - ์ดˆ๊ธฐ ํ•™์Šต๋ฅ  : 0.6 - Learning rate schedule : 30 epochs๋งˆ๋‹ค 0.1 # 6. Experiments ## ImageNet Classification ![[SE ImageNet performance.png]] - **SE**๊ฐ€ ์ ์šฉ๋œ ๋ชจ๋“  Model์—์„œ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋๋‹ค. - ํŠนํžˆ ResNet-50๊ณผ SE-ResNet-50, ResNet-101์„ ๋น„๊ตํ•ด๋ณด๋ฉด Top-5 err๊ฐ€ ResNet-50๋ณด๋‹ค SE-ResNet-50์ด *0.86%* ๋” ์ข‹์•˜๊ณ  ResNet-101๊ณผ ๋น„๊ตํ•ด๋„ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š์•˜๋‹ค. - ์—ฐ์‚ฐ๋Ÿ‰์„ ๋น„๊ตํ•ด๋ด๋„ **๊ธฐ์กด Model๊ฐ€ ํฌ๊ฒŒ ๋‹ค๋ฅด์ง€ ์•Š๊ณ ** ์„ฑ๋Šฅ์ด ๋น„์Šทํ•œ ResNet-101๋ณด๋‹ค SE Block์„ ์ ์šฉํ•œ SE-ResNet-50์ด ์—ฐ์‚ฐ๋Ÿ‰์ด ์œ ์˜๋ฏธํ•˜๊ฒŒ ๋‚ฎ์•˜๋‹ค. - ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, VGG-16, BN-Inception, MobileNet ๋“ฑ **๋‹ค์–‘ํ•œ Model์—๋„ ์ ์šฉ ๊ฐ€๋Šฅ**ํ–ˆ๊ณ  ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃจ์–ด๋ƒˆ๋‹ค. ## Scene Classification ![[SE Scene Classification performance.png]] - Places365-Challenge๋ฅผ ํ†ตํ•ด ์žฅ๋ฉด ์ธ์‹ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค. - 800๋งŒ ์žฅ์˜ Test ์ด๋ฏธ์ง€ - 36,500์žฅ์˜ Validation ์ด๋ฏธ์ง€ - 365๊ฐœ์˜ scene ์นดํ…Œ๊ณ ๋ฆฌ - SE Block์ด ๋‹จ์ˆœํžˆ ImageNet๋ฟ ์•„๋‹ˆ๋ผ scene classification์—๋„ **์ผ๋ฐ˜ํ™”๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ**์„ ๋ณด์˜€๋‹ค. ## Object Detection ![[SE Object detection performance.png]] - COCO ๋ฐ์ดํ„ฐ ์…‹ ์‚ฌ์šฉํ•ด ๋ฌผ์ฒด ํƒ์ง€ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค. - 80k Train ๋ฐ์ดํ„ฐ - 40k Validation ๋ฐ์ดํ„ฐ - 80๊ฐœ์˜ ๊ฐ์ฒด ํด๋ž˜์Šค - AP@loU : loU(intersection over Union)๊ฐ€ **0.5 ์ด์ƒ์ผ ๋•Œ**์˜ Average Precision - $IoU$ = $Area(pred โˆฉ gt)/Area(predโˆช gt)$ : ์˜ˆ์ธก ๋ฐ•์Šค์™€ ์ •๋‹ต ๋ฐ•์Šค์˜ **๊ฒน์น˜๋Š” ๋น„์œจ** - '๊ฐ์ฒด๋ฅผ 50% ์ด์ƒ ๊ฒน์น˜๊ฒŒ ๋งž์ท„์„ ๋•Œ ์–ผ๋งˆ๋‚˜ ์ž˜ ํƒ์ง€ํ–ˆ๋Š”๊ฐ€'๋ฅผ ์ธก์ • - AP (COCO metric) : COCO๊ฐ€ ์ œ์•ˆํ•œ **์—„๊ฒฉํ•œ ํ‰๊ท  AP** - IoU threshold๋ฅผ **0.5๋ถ€ํ„ฐ 0.95๊นŒ์ง€ 0.05 ๊ฐ„๊ฒฉ์œผ๋กœ** ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ 10๊ฐœ ๊ฐ’์„ ํ‰๊ท ๋‚ธ ๊ฒƒ. - 0.5์—์„œ๋งŒ ๋ณด๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, **์ •ํ™•ํžˆ ๋งž์ท„๋Š”๊ฐ€ (0.75~0.95)** ๊นŒ์ง€ ํ‰๊ฐ€. - SE Block์€ Classification ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ **detection task**์—์„œ๋„ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€๋‹ค. ## Analysis and Interpretation ### Reduction ratio r ![[SE Reduction ratio performance.png]] - r์˜ ๊ฐ’์€ Excitation ๋‹จ๊ณ„์—์„œ **์ฑ„๋„ ์••์ถ•๋น„**๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. - r์ด ์ž‘์„ ์ˆ˜๋ก ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋งŽ์•„์ ธ ์„ฑ๋Šฅ์€ ์ข‹์•„์ง€์ง€๋งŒ, ๊ณผ์ ํ•ฉ ์œ„ํ—˜๋„ ์ปค์ง„๋‹ค. - ์‹คํ—˜์  ๊ฒฐ๊ณผ๋กœ ๋ดค์„ ๋•Œ r =16์ผ ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜์™”๋‹ค. ### The role of Excitation ![[SE block Activation by Excitation.png]] - ๋„ค ๊ฐœ์˜ ํด๋ž˜์Šค (glodfish, pug, plane, cliff)๋ฅผ ์„ ํƒํ•ด ๊ฐ stage ๋งˆ์ง€๋ง‰ SE ๋ธ”๋ก์˜ 50๊ฐœ ์ฑ„๋„์— ๋Œ€ํ•ด ํ‰๊ท  ํ™œ์„ฑํ™” ๋ถ„ํฌ๋ฅผ ๊ณ„์‚ฐํ•ด Excitation์˜ ์—ญํ• ์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ–ˆ๋‹ค. - **์ดˆ๊ธฐ ๋‹จ๊ณ„ Stage 2~3** : ํด๋ž˜์Šค ๊ฐ„ ํ™œ์„ฑํ™” ๋ถ„ํฌ๊ฐ€ ๊ฑฐ์˜ ๋™์ผํ•˜๋‹ค โ†’ ํ•ด๋‹น Excitation์—์„œ๋Š” **์ผ๋ฐ˜์  ํŠน์ง•(general features)๋ฅผ** ๊ณต์œ ํ•œ๋‹ค. - **๊นŠ์€ ๋‹จ๊ณ„ Stage 4~5** : ํด๋ž˜์Šค ๋ณ„ใ„น๋กœ ์ฑ„๋„ ํ™œ์„ฑํ™” ํŒจํ„ด์ด ๋‹ฌ๋ผ์ง„๋‹ค โ†’ ํ•ด๋‹น Excitation์—์„œ๋Š” **ํด๋ž˜์Šค ํŠน์ด์  ํŠน์ง•(class-specific features)๋ฅผ** ๊ฐ•์กฐํ•œ๋‹ค. - **๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„ Stage 5_2, 5_3** : ์ผ๋ถ€ ์ฑ„๋„์ด ํฌํ™” ์ƒํƒœ์— ๊ฐ€๊น๋‹ค โ†’ ์ผ๋ฐ˜์ ์ธ **residual block**๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์ž‘๋™ํ•œ๋‹ค. ์ฆ‰, Stage SE block์„ ์ œ๊ฑฐํ•ด๋„ ์„ฑ๋Šฅ ์†์‹ค์ด ๊ฑฐ์˜ ์—†๋‹ค. - ์ด๋ฅผ ํ†ตํ•ด์„œ **ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ๊ฐ€ ๋งŽ์€ ๋งˆ์ง€๋ง‰ SE Block์€ ์ œ๊ฑฐ**ํ•˜์—ฌ Model์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ๊ฒŒ ๋งŒ๋“ ๋‹ค. # 7.Conclusion - ์ด ๋…ผ๋ฌธ์—์„œ๋Š” **Squeeze-and-Excitation (SE) Block**์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ตฌ์กฐ์  ๋‹จ์œ„๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. - **SE Block**์„ ์ด์šฉํ•ด ๋„คํŠธ์›Œํฌ๊ฐ€ **๋™์ ์œผ๋กœ ์ฑ„๋„๋ณ„ ์ค‘์š”๋„๋ฅผ ์กฐ์ •**ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค์–ด ํ‘œํ˜„๋ ฅ์„ ๋†’์˜€๋‹ค. - SENet์€ ๋‹จ์ˆœํžˆ ์„ฑ๋Šฅ๋งŒ ์˜ฌ๋ฆฐ ๊ฒŒ ์•„๋‹ˆ๋ผ, ๊ธฐ์กด CNN ๊ตฌ์กฐ๊ฐ€ ์™œ ์ฑ„๋„ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ์ž˜ ๋ชจ๋ธ๋ง ํ•˜์ง€ ๋ชปํ–ˆ๋Š” ์ง€์— ๋Œ€ํ•œ ํ†ต์ฐฐ๋„ ์ œ๊ณตํ•œ๋‹ค. - SE Block์ด ํ•™์Šตํ•˜๋Š” ์ฑ„๋„ ์ค‘์š”๋„ ์ •๋ณด๋Š” **๋ชจ๋ธ ์••์ถ•(compression)**, **pruning(๋ถˆํ•„์š”ํ•œ ์ฑ„๋„ ์ œ๊ฑฐ)** ๊ฐ™์€ ์—ฐ๊ตฌ์—๋„ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. # Code review ๐Ÿ“š[์ฝ”๋“œ ๋ณด๋Ÿฌ๊ฐ€๊ธฐ](https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py) - ์ง์ ‘ SE-ResNet-50๊ณผ ResNet-50 Model์„ ๊ตฌํ˜„ํ•ด๋ณด๊ณ  ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ด๋ณด์•˜๋‹ค. - DateSet์€ **CIFAR10**์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ## ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ฐ ์ฆ๊ฐ• 1. Random-size cropping : padding์„ 4๋กœ ํ•˜์—ฌ ํฌ๊ธฐ๋ฅผ ํ‚ค์šฐ๊ณ  ๋žœ๋ค์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ž˜๋ž๋‹ค. 2. Random horizontal flipping ์‚ฌ์šฉ 3. Input normalization : ํ‰๊ท  ์ฑ„๋„๊ฐ’์„ ๋นผ์„œ ์ •๊ทœํ™” 4. Train Data๋ฅผ 8:2๋กœ ๋‚˜๋ˆ ์„œ ํ•™์Šตํ•  ๋•Œ๋Š” Train set์„ ์‚ฌ์šฉํ•˜๊ณ  ํ•™์Šต ์ค‘๊ฐ„์ค‘๊ฐ„ Valldation set์„ ์‚ฌ์šฉํ•˜์—ฌ top-1 error๋ฅผ ์ธก์ •ํ•˜์˜€๋‹ค. ## ํ•™์Šต ์ธํ”„๋ผ - ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ : 4080 Super - Mixed Percision ์‚ฌ์šฉ ## ์ตœ์ ํ™” ์„ค์ • - Optimizer : SGD ์‚ฌ์šฉ, Momentum 0.9 - Batch size : 512 - ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ ์†์‹ค ํ•จ์ˆ˜ ์‚ฌ์šฉ - ์ดˆ๊ธฐ ํ•™์Šต๋ฅ  : 0.001 - Learning rate schedule : 5 epochs๋งˆ๋‹ค ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์—†์œผ๋ฉด x0.2, ์ตœ์†Œ 1e-6 ## SE-ResNet-50 - ๊ธฐ์กด ResNet์— SE-Block ํ•จ์ˆ˜๋ฅผ ๋ถ™์ด๋Š” ํ˜•์‹์œผ๋กœ SE-ResNet-50 Model์„ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. - ๋…ผ๋ฌธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋งˆ์ง€๋ง‰ Block์—์„œ SE Block์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค. - SE-Block์˜ r = 16์œผ๋กœ ๋‘๊ณ  ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ## ๊ฒฐ๊ณผ ![[SE-ResNet-50 VS ResNet-50 Loss, Error Comparsion.png]] | | Loss | Error | ์ด ํ•™์Šต ์‹œ๊ฐ„ | | ------------ | ------ | ------ | ---------- | | ResNet-50 | 0.3488 | 20.20% | 108m 57.1s | | SE-ResNet-50 | 0.2512 | 17.53% | 128m 38.5s | - SE-ResNet-50๊ณผ ResNet์˜ Test Error ์ฐจ์ด๋Š” ์•ฝ 2.67% ์ •๋„ ๋‚ฌ๊ณ  ์ด ํ•™์Šต ์‹œ๊ฐ„์€ ์•ฝ 20m, ์•ฝ 18% ์ •๋„ ์ฐจ์ด๊ฐ€ ๋‚ฌ๋‹ค. ![[SE-ResNet-50 VS ResNet-50 Top Error Comparsion.png]] - ์‹ค์ œ ์ตœ์ข… Top Error์—์„œ๋Š” Top-1, Top-5 ๋ชจ๋‘ SE-ResNet์ด ๋†’์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ ธ์™”๋‹ค.