Deep Learning Program Simplifies Your Drawings | Two Minute Papers #107


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. First, let’s talk about the raster and vector
graphics. What do these terms mean exactly? A raster image is a grid made up of pixels,
and for each of these pixels, and for each of the pixels, we specify a color. That’s all there is in an image – it is nothing
but a collection of pixels. All photographs on your phone, and generally
most images you encounter are raster images. It is easy to see that the quality of such
images greatly depends on the resolution of this grid – of course, the more grid points,
the finer the grid is, the more details we can see. However, in return, if we disregard compression
techniques, the file size grows proportionally to the number of pixels, and if we zoom in
too close, we shall witness these classic staircase effects that we like to call aliasing. However, if we are designing a website, or
a logo for a company, which should look sharp on all possible devices and zoom levels, vector
graphics is a useful alternative. Vector images are inherently different from
raster images, as the base elements of the image are not pixels, but vectors and control
points. The difference is like storing the shape of
a circle on a lot of pixels point by point, which would be a raster image, or just saying
that I want a circle on these coorindates with a given radius. And as you can see in this example, the point
of this is to have razor sharp images at higher zoom levels as well. Unless we go too crazy with fine details,
file sizes are also often remarkably small for vector images, because we’re not storing
the colors of millions of pixels. We are only storing shapes. If we want to sound a bit more journalistic
we can kind of say that vector images have infinite resolution. We can zoom in as much as we wish, and we
won’t lose any detail during this process. Vectorization is the process where we try
to convert a raster image to a vector image. Some also like to call this process image
tracing. The immediate question arises – why are we
not using vector graphics everywhere? Well, one, the smoother the color transitions
and the more detail we have in our images, the quicker the advantage of vectorization
evaporates. And two, also note that this procedure is
not trivial and we are also often at the mercy of the vectorization algorithm in terms of
output quality. It is often unclear in advance whether it
will work well on a given input. So now we know everything we need to know
to be able to understand and appreciate this amazing piece of work. The input is a rough sketch, that is a raster
image, and the output is a simplified, cleaned-up and vectorized version of it. We’re not only doing vectorization, but simplification
as well. This is a game changer, because this way,
we can lean on the additional knowledge that these input raster images are sketches, hand-drawn
images, therefore there is a lot of extra fluff in them that would be undesirable to
retain in the vectorized output, therefore the name, sketch simplification. In each of these cases, it is absolute insanity
how well it works. Just look at these results! The next question is obviously, how does this
wizardry happen? It happens by using a classic deep learning
technique, a convolutional neural network, of course, that was trained on a large number
of input and output pairs. However, this is no ordinary convolutional
neural network! This particular variant differs from the standard
well-known architecture as it is augmented with a series of upsampling convolution steps. Intuitively, the algorithm learns a sparse
and concise representation of these input sketches, this means that it focuses on the
most defining features and throws away all the unneeded fluff. And the upsampling convolution steps make
it able to not only understand, but synthesize new, simplified, and high-resolution images
that we can easily vectorize using standard algorithms. It is fully automatic and requires no user
intervention. In case you are scratching your head about
these convolutions, we have had plenty of discussions about this peculiar term before,
I have linked the appropriate episodes in the video description box. I think you’ll find them a lot of fun – in
one of them, I pulled out a guitar and added reverberation to it using convolution. It is clear that there is a ton of untapped
potential in using different convolution variations in deep neural networks. We’ve seen in a DeepMind paper earlier that
used dilated convolutions for state of the art speech synthesis, that is a novel convolution
variant and this piece of work is no exception either. There is also a cool online demo of this technique
that anyone can try. Make sure to post your results in the comments
section! We’d love to have a look at your findings. Also, have a look at this Two Minute Papers
fan art. A nice little logo one of our kind Fellow
Scholars sent in. It’s really great you see that you’ve taken
your time to help out the series, that’s very kind of you. Thank you! Thanks for watching, and for your generous
support, and I’ll see you next time!

Leave a Reply

Your email address will not be published. Required fields are marked *