I used Captum to interpret the output of a MobileNetV2, which visualized the main regions in the input image that drove the model to generate its output.
I combined my previous posts on image captioning and visual question answering and extended them to a wider topic - connecting computer vision and natural language.
Recently I was working with PyTorch multi-GPU training and I came across a nightmare GPU memory problem. After some expensive trial and error, I finally found a solution for it.
I will start from the problem of semantic segmentation, introduce how to use CNNs to solve it, and talk about fully convolutional networks, a widely used framework for semantic segmentation, in great details. Moreover, I will analyze the MXNet implementation of FCNs.