[Conceptual Background] Rethinking on Weight Decay
In this post, I intend to reflect on Weight Decay, which is an almost essential element in Deep Learning. Existing posts regarding WD mostly explain the phenomenon itself. So, in this post, I intend to examine in what context WD is actually used, and in what sense it is being used recently.
Recap of Weight Decay (WD)
Recently, when training m...
[Information Theory] Asymptotic Equipartition Property (AEP)
Recently, while studying AI, I naturally began to think a lot from the perspective of Information Theory. In that process, I took a deep look into the theory itself again, and since my blog content had been sparse for a while, I thought this would be a good opportunity to organize and post what I’ve studied, centering on the textbook “Elements o...
[Coding] Super Easy Guide to Applying PyTorch DDP
This is my first coding-related post. The topic is DDP. Recently, as model capacities have grown, using multiple GPUs has become essential. Consequently, utilizing DDP effectively has become very important. Therefore, in this post, I will share how to apply DDP. I will cut to the chase regarding the general mechanics and focus simply and clearly...
[Theoretical Background] Convex Optimization 2
I’m back with the second post related to optimization. Last time, we examined how it expands and converges when the function is L-lipschitz. This time, let's examine what happens when stronger constraints / assumptions are applied. Then, without further ado, let's jump right in.
$\star$ Recap (Previous Post)
In the previous post, assuming a f...
[Theoretical Background] Convex Optimization 1
This year, I took a class involving the concept of Optimization for the first time. In dealing with the Deep Learning field, it was accepted like a foundational discipline to mathematically verify whether convergence happens, and if so, how fast it happens. While the Meta Learning and Generative Models I posted previously, and the Foundation Mod...
8 post articles, 2 pages.