I combined my previous posts on image captioning and visual question answering and extended them to a wider topic - connecting computer vision and natural language.
I will start from the problem of semantic segmentation, introduce how to use CNNs to solve it, and talk about fully convolutional networks, a widely used framework for semantic segmentation, in great details. Moreover, I will analyze the MXNet implementation of FCNs.