Designing for Accessibility (Google I/O’19)


[MUSIC PLAYING] ELISE ROY: You see this? This is my old hearing aid. And this? This is one that I wear today. What’s different? This one’s flesh-colored
and this one’s red. It may seem like a small design
tweak, but it changed my life. It made me feel as
if I belong again. You see, right before fifth
grade, my mom sat me down and she said, you need to tell
Michelle about your hearing loss. Michelle was my best
friend, but I hadn’t seen her all summer long. And two weeks before, I was told
that I was losing my hearing and it was going to just get
worse and worse and worse. But I was 10 years old. I didn’t know how to deal
with heavy stuff like this. And so when I called her
up before school started, we talked about our summers,
we talked about sports, we talked about everything
but my hearing loss. On the second day of school,
she was standing behind me in line and she tapped
me on the shoulder, pointed to my hearing aid
and said, what’s that? It was an innocent question, but
I didn’t know how to respond, and so I said,
it’s a hearing aid, as if she was
stupid not to know. And then there was just silence. And this silence followed us for
the rest of our relationship. She never asked me again
about my hearing loss, and I spent that year watching
her slowly drift away. I felt different, as
if I didn’t belong, and she being just 10
years old didn’t know how to deal with this difference. Many kids avoid difference
because they’re just not sure what to do with it. And so I found out
quickly that I didn’t want to be seen as different. And again, this
long struggle to try to prove that, although
I had a hearing loss, it didn’t change me. I was still normal. And I did this by overachieving. Going– when I went to
college, playing just one sport wasn’t enough. I had to play two. I had to go to an Ivy League. I became one of the first
few deaf lawyers in the US. I did some work at
the United Nations, and then I became a designer. But somewhere along the
way, I realized something. I am the new normal. You know that TV show
“Orange is the New Black”? Well, I am the new normal. Difference is the new normal. Difference, even if it
seems like limitation, is what makes us thrive,
what makes us valuable. Now, I would like
to think about this. This body encompasses all of us. If we all live long enough,
we will all get disability at some point in our lives. And who here has broken
their legs or their arms? Really, that’s it? Come on. That’s an example of a
temporary disability. But what comes next is key. We also all experience something
called momentary disabilities. Now, I’d like to
call up a volunteer to come up and help me
demonstrate what they are. [APPLAUSE] What I’d like you
to do is to pick up that box and those
books, and then come over while carrying the
box, take sip of the water. You’re going to have
to open it, though. You have to open it. That was pretty impressive. Most pe– AUDIENCE: [INAUDIBLE] [APPLAUSE] ELISE ROY: Was it easy? Hard? AUDIENCE: Totally. Kind of fun. ELISE ROY: Yes. So thank you very
much for your help. As we go about our lives,
we encounter situations where we will be
momentarily disabled, whether we’re carrying box
and trying to open up a door. And so disability really
encompasses all of us. There are just some of
us that experience it a lot more than others. Now, as a lawyer, I fought
for equality and race, gender, disability, and you
would think that I would have been outfitted with
the skills necessary to feel accepted and valued by society. But to my surprise, I
found the strongest tools when I transitioned
from law to design. Design has this powerful
ability to shift perceptions, but it’s up to you to use it. Up to you. So finally, it happened. After law school, I went
back to the audiologist to get a new hearing
aid, and I was thrilled because
they weren’t just these awful flesh-colored
things anymore, but they invented red ones
and blue ones and green ones. So I opted for the
bright red one, and then something
magical happened– my hearing aid became cool. People started saying things
like, [GASP] love the red! This little thing created
this huge shift in my life. It allowed me to
celebrate my difference and it allowed others to
join in on celebrating this difference with me. This is because it
opened up the door to conversing about
difference without being focused on limitations. MICHAEL BRENNER: OK. Thank you, Elise. That was a beautiful talk, and
it was a very good introduction to our story, which we
call Project Euphonia. So we’re going to
start the story by telling you a story about
one of our colleagues at Google. So this is Dimitri Kanevsky,
and Dimitri, it turns out, is a mathematician. He’s worked at some of the great
institutions for mathematics in the world. But for the last
two decades, he’s really been thinking
primarily about designing for accessibility–
that is, trying to invent technology that was
helpful in some way or other. So Dimitri himself has a
disability– he’s deaf– and he also has a very
strong Russian accent. So the first time that
at least I met Dimitri, I found it very
hard to understand what he was talking about. But, you know, hanging
out with Dimitri, eventually you get the idea. So it turns out
that our computers have the same problem– that is, when Dimitri
speaks to his phone as I might speak to
my phone, his phone doesn’t understand
him very well. And this is a clip in which
he explains that himself. So what you see from this
is that the phone that was being showed was a phone
that was running the Google Cloud Speech Recognition Model. And what I would claim
is that if you only looked at the phone,
that you would not be able to really understand
the thread of what Dimitri was trying to communicate. And so we asked ourselves the
question, why is that the case? Why is it that the phone was
not able to understand Dimitri but, for example, it is
able to understand me? And in order to
explain this, I need to tell you a little bit about
how speech recognition works and why it is that speech
recognition has gotten so much better over the past
number of years. So when we speak, what we’re
doing is creating a wave form. So a wave form is
just a sound wave and it looks rather
unintelligible. The job that we’re
asking a computer to do is to take the
picture on the left and to somehow turn it into
the words that are being said. So as you all know, humans
have gotten very good at interpreting
pictures, and so the way that speech recognizes work
is we first take the wave form and turn it into a picture. The picture is
called a spectrogram and it’s just a
picture of colors, but it’s still unintelligible
as to what was being said. And then what we do
is take the picture and stick it into a
neural network, which is a big computer program that
has lots of parameters in it. And the idea is to make the
computer program so that it outputs what was being said. Now of course, just
like us, if you don’t train the
computer program, it has no idea what
was being said. And so what we do is we take all
of the numbers in this computer program– there are millions of
numbers that you have to tune– and we give it one sentence
at a time, somebody saying something. And the computer
predicts it’s saying this and then it gets it wrong,
and we bang the computer over the head, twiddle the
parameters around a little bit until eventually by giving it
lots and lots of sentences, it gets better at
speech recognition. And we have phones
that work for people whom the computer has heard. Now in order to
do that, it takes huge numbers of sentences. So tens of millions,
say, of sentences need to be given to
the computer for it to develop a general
type of understanding. But the problem is that
for people like Dimitri, or indeed anyone who
speaks in a way that is different than
the pool of examples that the computer was given,
the phone can’t understand them just because it’s never
heard the example before. And so the question that we
asked, and this was a question that we started asking in
collaboration with an ALS foundation that we’ve
been working with– ALS TDI, who gave
me this T-shirt– so we asked whether or not
it’s possible to basically fix the speech recognizers
to work for people who are hard to understood. And Dimitri is amazing and
he decided to take this on. So remember what I said–
it takes tens of millions of sentences to train
a speech recognizer. It’s completely crazy to ask
someone to sit and record tens of millions of sentences. But Dimitri has a
great spirit, and so he sat in front of his
computer and he just started recording sentences. And so, for example,
here is a sentence– what is the temperature today? And so the computer
would say “What is the temperature today?” And Dimitri would read “What
is the temperature today?” And he sat there for days
recording these sentences, until he had reported
upwards of 15,000 sentences, and we then decided to
train the speech recognizer to see if it was able
to understand him. And I should tell
you that none of us knew whether or not it was even
conceivable that this could work because, as I said, it took
many more sentences to train the thing in the first
place for many people who speak in a way that is more
typical for speech recognizers. So here’s Dimitri at the end. He was still happy
after doing this. And then here is the– I’m now going to show you a
quick clip of what happened. And so what you see is that
the device on the right was able to understand Dimitri,
whereas the device on the left, which is the Google
Cloud device, was not. And this really
gave us confidence that it was possible to
make progress on this task. And so we started working in
earnest with our collaborators ALS TDI and which we recruited. They recruited a
large number of people with ALS to start recording
sentences to see if this works. Now, of course, getting someone
to record 15,000 sentences is completely crazy. That’s never going
to work at scale. And so instead we were
investigating technically whether or not it’s possible
to make progress with smaller numbers of sentences. And what I can report to you
is that we’re making progress. We’re not there yet. We do not feel that we’ve
solved this problem in any way. But we’re working
hard, and there are groups of engineers at
Google who are working hard. And this is just
a little example. So the last column is
the ground truth phrases, the rightmost column is
what Google Cloud recognizes on this particular person
who happens to have ALS, and the middle column is what
our recognizer is right now doing, and we’re hard at
work trying to figure out if it is possible to
make this work for people without requiring so
much training data. So this is Dimitri
as of this week. So Dimitri now carries
around with him about five different phones
in his pocket, each of which has a different speech
recognizer on it, and he’s testing and trying
to figure out the best way. And it is our hope
that if we can get this to work with Dimitri’s
help and with all of your help, and hopefully
people will record, make recordings for us– the
reason for this call for data that Sundar made is that we
need more data from people, just recordings to be able
to make this work. Hopefully we will get there. That is our goal. And so this sort of
is the general goal of Euphonia’s
mission, which is what we would like to do is
to improve communication technology by including as many
people as possible, whatever features that the
people have and whatever means to communicate. Of course, speaking is an
important way of communicating, but it is not the only
way that we communicate. We communicate with each
other by looking, by feeling, by doing so many
different things. And there are people who don’t
have the ability to speak, and so now I’m going
to turn it over to Irene, who will start to
talk about other speaking modalities. IRENE ALVARADO: All right. Thanks, Michael. [APPLAUSE] All right, so so far
we’ve talked about Dimitri and about speech, but what about
other forms of communication? What about folks who can’t
communicate verbally? We want to show you
how we’re approaching the research for those
types of cases as well. So for that, I’d
like to introduce our second protagonist for the
day, the amazing Steve Saling. He’s an incredible person. He had a brilliant career
as a landscape architect and when he learned
that he has ALS, he set about to rethink how
people with his condition get care. He also started
thinking about how he could leverage technology
to create more independence for himself, so
that he didn’t have to rely as much on other
people to take care of him. And one thing he
helped do is he helped create a smart
home-like system that lets him request an elevator
and close the blinds, turn on the music, all
by using his computer. It’s really amazing. So Steve happened to be one of
the perfect persons to partner with for this research because
he is a technologist himself. And speaking of
computers, we want to show you how many folks who
have ALS communicate today. They use something called
an eye gaze pointer to type out letters one by one. So these are two
different systems that they can use either
a keyboard or something on the right called Dasher. And it works– it does the job– but if you can imagine,
it’s just a little bit slow. And what he’s missing is a layer
of communication that all of us are familiar with– interruptions,
mannerisms, jokes, laughs. Synchronous communication
that comes by quickly. That’s something that’s really
hard for Steve and people with his condition to do. So something we
wanted to try with him was to see if he could train his
own personal machine learning models to classify
different face expressions, and the thought was,
is this even useful for him to be able to trigger
things more quickly so that he might be able to open
his mouth and trigger something on the computer
or raise his eyebrows and trigger something else? It was a question. It’s a research question. And we didn’t know the answer. So with Steve’s feedback, his
ideas, and a lot of testing, we developed a
machine learning tool that anybody actually can
use to train classification models in the browser. And by classification,
I mean a model that tries to predict what
category a certain type of input belongs to. Let me show you an example
so you see how it works. This is my colleague Barron and
he’s training two classes, one to detect his face and one to
detect this really cute cat pillow that he has. So he’s giving the
computer a bunch of data. He’s training it,
waiting for it to finish, and then he’s testing
the model on the right. And then he publishes the model. All of this is happening in
the browser in real time, and the images– the processing is
happening in his computer, so the images aren’t
being sent to a server. It’s all happening in his
computer in the browser. So we’re calling this
Teachable Machine. It’s a tool for anybody
to train machine learning models in the browser without
having to know how to code. And it’s actually built
on top of TensorFlow.js, so all of the underlying
technology is free and it’s open source
for you to use. So, OK, how is Steve using this? Well, as I mentioned, he’s
training face classification models for cases where he
might want a faster response time than what he can achieve
with his eye gaze pointer, and Teachable Machine
is the prototyping tool that’s allowing him
to do this and explore what types of use cases are
actually helpful for him. So why is this useful? Well, Teachable Machine is
situational in two ways, right? ALS actually changes over time,
so people with the condition, they deteriorate over time. So Steve might be able
to do an expression today that he can’t do in a year. He has to be able to retrain
those models on his own, perhaps week by week, month
by month, as he needs it. And the second thing
is that you might imagine that he might want
to use different models for different use cases. One thing that he
actually tried was training a model that
would trigger an air horn, like a sound of an air horn
when he opens his mouth, and to trigger a boo when
he raises his eyebrows. And he used it one night
to watch a basketball game with one of his favorite teams
to react quickly to the game as it progressed. Unfortunately that night,
his team didn’t win, but it was actually
really fun to set up. So we’ve got a long way
to go with this research. This is really
only the beginning, and we hope to expand
the tool to support many more modes of input. The tool itself will be
available later this year for anyone to train their
classification models, but as I said before,
all of the technology is already available
on TensorFlow.js. We’re committed to working with
people like Steve and Dimitri to make their
communication tools better, and the idea really is to start
with the hardest problems that might unlock innovations
for everyone. But it’s our sincere hope
that this kind of research might help people with other
types of speech impairments– people with cerebral
palsy or Parkinson’s or multiple sclerosis. And maybe, perhaps
one day, it could be helpful to even more people–
people who freely communicate today, maybe like folks who have
an accent in a second language. And in fact, we started
calling this approach to building “Start with
One, Invent for Many.” We think anybody
can work this way, and you can apply to many
more types of problems. The idea is actually
quite simple– so start by working
together with one person to solve one problem, and that
way you can be sure that what you make for them will
be impactful to them and the people and their lives. And sometimes– it
doesn’t always happen, but sometimes– what
you make together can go on to be useful
to many more people. Start with One, Invent for Many. If you’d like to hear more
about this project and Start with One, if you’d like to
hear more about Dimitri, Steve, and actually play
Teachable Machine, we have all of these projects
in the Experiment Sandbox tent, which is actually
really close to the stage. And finally, lastly,
we’d like to invite you to help this research effort. As Michael was saying,
we don’t expect people to train 15,000
phrases in order to get a model like
this, so we actually need volunteers to share
their voice samples with us so that we may one day
generalize these models. So if you or anyone you know
has hard-to-understand speech, we’d like to invite
you to go to this link and submit some samples,
and hopefully one day we can make these models more
widely accessible to everyone. Thank you. [APPLAUSE] [MUSIC PLAYING]

One comment

  • Grayson Peddie

    Google Developers, how can I help correct the closed caption when I see "hearing laugh" and "hearing lapse?" I think she said "hearing loss." It seems the automatic closed captioning is not picking up the "ss" sound when she says "hearing loss."

    4:03 "Disability encompasses ah of us" should be "Disability encompasses all of us." Seems to me the automatic closed caption thinks she said "ah" instead of "all."

    4:15 For confirmation, did she say "yeah" in "who yeah has broken their legs or arms?"

    6:26 "From na2 design." I don't know if I was lipreading right, but didn't she say "from my to design?" Could automatic closed captioning system do lipreading in the future?

    7:20 Again, I did some more lipreading and I think she said "it allowed me to celebrate" and the dynamics of her voice caused automatic closed caption to miss "me" after "allowed" as in "it allowed to celebrate."

    7:58 "So this is Dimitri connects key" should be "So this is Dimitri Kanevsky."

    20:57 "Start with one invent for money" should be "start with one invent for many." At least she said "start with one, invent for many" the second time and automatic closed caption got it right.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *