
專欄介紹:《香儂說》為香儂科技打造的一款以機器學習與自然語言處理為專題的訪談節目。由斯坦福大學,麻省理工學院, 卡耐基梅隆大學,劍橋大學等知名大學計算機系博士生組成的“香儂智囊”撰寫問題,採訪頂尖科研機構(斯坦福大學,麻省理工學院,卡耐基梅隆大學,谷歌,DeepMind,微軟研究院,OpenAI 等)中人工智慧與自然語言處理領域的學術大牛, 以及在博士期間就做出開創性工作而直接進入頂級名校任教職的學術新星,分享他們廣為人知的工作背後的靈感以及對相關領域大方向的把控。
本期採訪嘉賓是斯坦福大學計算機學院終身教授Dan Jurafsky。隨後香儂科技計劃陸續推出 Eduard Hovy (卡耐基梅隆大學), Anna Korhonen (劍橋大學), Andrew Ng (斯坦福大學), Ilya Sukskever (OpenAI),William Yang Wang (加州大學聖芭芭拉分校), Jason Weston (Facebook人工智慧研究院), Steve Young (劍橋大學) 等人的訪談,敬請期待。
斯坦福大學計算機學院教授 Dan Jurafsky 是自然語言處理領域泰斗,他所著的《語音與語言處理》一書,被翻譯成 60 多種語言,是全世界自然語言處理領域最經典的教科書。Dan Jurafsky 曾在 ACL 2006,EMNLP 2013,WWW 2013 獲最佳論文獎,在 2002 年獲得麥克阿瑟天才獎(美國跨領域最高獎項),2017 年獲得美國科學院 Cozzarelli 獎,2015 年獲得古爾德獎。
Dan Jurafsky 教授在 Google Scholar 上取用量超過 3 萬,h-index 達 75。他的主要研究方向有自然語言理解,對話系統,人與機器語言處理之間的關係等,並一直嘗試運用自然語言處理方法來解決社會學和行為學問題。同時,他還對食物語言學以及中文有著極大的興趣,他所著的科普圖書《食物的語言—從語言學家角度讀選單》被翻譯成多國語言,榮獲 2015 國際暢銷書榜首,並獲 2015 年 James Beard Award 提名。

▲ 圖1. 斯坦福大學計算機學院終身教授Dan Jurafsky早在80年代就與中國結下了不解之緣。圖為他1985年在北京大學進修中文時的留影(第二排右二即是青年時代的Dan Jurafsky)。
© Jurafsky教授個人主頁:https://web.stanford.edu/~jurafsky/
香儂科技:您現正在編輯《語音和語言處理》的第三版,這本書是自然語言處理(Natural Language Processing, NLP)領域使用最廣泛的教科書,編輯的過程中,您對過去幾年自然語言處理領域的變化總體上有何體會?最令人興奮的事是什麼?最令人失望的事又是什麼(如果有的話)?

▲ 圖2. Dan Jurafsky與James Martin所著的《語音和語言處理》一書,被翻譯成60多種語言,是全世界自然語言處理領域最經典的教科書。
Jurafsky:能在這個時代身處這個領域是一件令人激動的事!當然,我會為深度學習感到特別的興奮,而我覺得最值得期待的是自然語言生成方面將發生的巨大改變,這是一個很有潛力的領域,卻在自然語言處理中被邊緣化了太久的時間。
另外,嵌入,特別是基於背景關係的嵌入(embeddings/contextualized embeddings)的使用也令人興奮不已,它讓我們得以構造模型來捕捉詞義在不同時間、空間,語境中的動態變化。另外一件事是人們對 NLP 領域的社會性有了日漸提高的覺知:人們既意識到模型存在一定的偏見,也意識到這些模型可以用來模擬和理解人與人之間的互動,進而將這個世界變得更好。
香儂科技:許多 NLP 研究人員都有很強的語言學背景,甚至本身就來自該領域。然而,隨著深度學習的方法在 NLP 中變得越來越主導,有人說(這甚至可以算得上一種趨勢)語言學知識不再是進行 NLP 研究的必要條件:只要訓練一個雙向長短時記憶迴圈神經網路(bidirectional LSTM RNN)就足夠了。您能評價下這一說法嗎?另外,您如何評價 Frederick Jelinek 教授的名言“每次我解僱一個語言學家時,我的語音識別器的效能都會提高”?
Jurafsky:我堅定地相信,想要為一個知識領域做出貢獻,充分瞭解這個領域是有幫助的,所以我認為 NLP 研究人員深刻地理解語言的功能,以及熟悉各類語言現象,比如:指代,組合性,變異,語法結構,隱含意義,情感,語言風格,對話互動等仍是至關重要的。
但理解語言和語言現象並不意味著盲目地套用不恰當的語言學模型。Jelinek 教授的那句話(他曾告訴我他的原話其實更加婉轉:“每個語言學家離開團隊時模型的識別率都會上升’”)實際上是指語音識別中的發音建模。事實證明(並且現在仍然正確)在擁有足夠的資料時,機器學習能夠比人工定義語音規則更好地解決語音多樣性的問題。
所以我認為這個領域未來仍將是機器學習與語言結構、知識的不斷融合,而每個研究人員將在不同時間不同情況下決定如何分配這兩個重要組成部分的權重。
香儂科技:從歷史的角度來看,重大的突破通常首先在語音處理中發生,然後傳播到自然語言處理領域。例如:在 20 世紀 90 年代早期,來自語音領域的 Peter Brown 和 Robert Mercer 將統計機器學習模型引入 NLP 領域,從而徹底改變了該領域;而深度學習方法是首先在 2012 年被微軟研究院科學家鄧力等人應用在語音領域並取得突破性進展,而在 NLP 中大規模使用深度學習要到 2013-2014 年。回顧這些,您能解釋為什麼會發生這種情況嗎,還是說它只是巧合?
Jurafsky:正如你所說,統計模型確實是從語音領域傳播到 NLP,深度學習也是從語音和視覺領域傳播到 NLP。我認為這根本不是巧合,一般來說創新來自於在不同領域工作的人一起工作時的結合。對諾貝爾獎獲得者的研究表明,他們往往是“搭橋者”—將不同領域的方法聯絡在一起。因此,我對年輕學者的建議是多利用跨學科的聯絡,與相關但不同領域的人交談。這就是重大突破誕生的方式。
香儂科技:您在博士後階段做了 3 年的語音處理研究。您能描述一下這些年的研究是如何影響了您在 NLP 領域的研究生涯嗎?
Jurafsky:它的影響是非常巨大的。我的博士後是在 1992-1995 年,正是機器學習、機率理論(probability theory)、圖模型(graphical models)、神經網路(neural networks)以及早期版本的嵌入(embeddings)同時進入 NLP 的時期。我很幸運能夠在加州大學伯克利分校國際電腦科學研究所(ICSI – UC Berkeley)的一個語音識別和神經網路實驗室攻讀博士後,並與 Nelson Morgan 和 Jerry Feldman 合作。那個實驗室對我有著重要的意義,我的導師們對 NLP 領域的“大熔爐”觀點對我產生了非常大的影響:你必須重視文字、語音、對話以及認知科學,給予它們和工程學一樣多的思考。
我們當時不知道什麼會成為最主流的模型,是機器學習這個大領域,還是具體的圖模型或神經網路。當時,因為沒有足夠多的 GPU,訓練神經網路要慢得多,所以實驗室必須搭建自己的向量處理器,而一個有著 4000 個單元的隱藏層的語音識別網路在當時是非常巨大的神經網路,要花極久的時間來訓練。如果你當時讓我預測,我不會預想到深度學習二十年之後會是今天這樣的局面。有趣的是,我和 Martin 寫的《語音和語言處理》教科書的第一版僅僅介紹了神經網路作為語音識別演演算法;在第二版,我們刪除了神經網路,轉而使用高斯模型,而在第三版中,我們又把神經網路加回來了!
香儂科技:在過去,您和您的學生使用 NLP 技術研究了許多社會科學中的重要問題(例如,Garg et al. PNAS 2018; Voigt et al. PNAS 2017, Winner of Cozzarelli Prize)。您對於想要進行更多這樣跨學科研究的 NLP 研究人員有哪些建議呢?

▲ 圖3. Voigt et al. PNAS 2017中Dan Jurafsky的實驗室與斯坦福大學心理系合作,利用自然語言處理方法,自動評估警察對不同種族的人說話時的尊重程度。
© Voigt et al. PNAS 2017
Jurafsky:我們應該跟社會科學家多交流!我認為,如果你要研究與人類有關的任何東西,與社會科學專家合作這一點非常重要!社會科學專家不僅有著更多關於人和社會關係的思考,而且與電腦科學家相比,他們往往在統計和因果推理方面更有經驗。再次強調,是跨學科引發了創新!
香儂科技:近年來,人們對機器學習的模型中的偏見有很多擔憂。這個問題似乎在 NLP 領域格外突出,因為在自然環境(例如,twitter)中收集的資料不可避免地包含偏見(性別歧視,種族歧視等)。盲目地用這些資料訓練深度神經網路將導致有偏見的模型預測。您怎麼看待這一問題?
Jurafsky:是的,現在每天有數百萬,甚至是數十億人在使用 NLP 工具,如機器翻譯,資訊抽取,自動推薦等等,這是激動人心的進展。但是正像你所說,這些廣泛的應用是有副作用的!NLP 的工作在道德層面上是影響社會的,越來越多的人,包括這個領域的年輕從業者以及我們科學和技術的消費者正在更多地關註這些影響。
我很高興我們終於開始正視這些問題!也許我們可以向那些長期以來必須面對這些道德困境和社會挑戰的領域學習,比如:醫葯學,核物理學,生物學,社會科學等。你問我當需要權衡準確性與偏見時應該做些什麼,我的答案是我們需要時時捫心自問:我們工作的終極標的是什麼。我們現在意識到,這個終極標的絕不僅僅是為了提高準確性或速度,而是真正讓世界變得更美好。這是一個模糊的答案,需要結合具體的演演算法或任務來實踐,但是希望我們能夠成功!
參考文獻
[1]. Jurafsky D and Martin J H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition[M]. Second Edition. 2009, Prentice-Hall.
[2]. Jurafsky, D. 2014. The Language of Food: A Linguist Reads the Menu[M]. Norton.
[3]. Garg N et al. Word embeddings quantify 100 years of gender and ethnic stereotypes[J]. Proceedings of the National Academy of Sciences, 2018, 115(16): E3635-E3644.
[4]. Voigt R et al. Language from police body camera footage shows racial disparities in officer respect[J]. Proceedings of the National Academy of Sciences, 2017, 114(25): 6521-6526.

香儂招聘
香儂科技 (http://shannon.ai/) ,是一家深耕金融領域的人工智慧公司,旨在利用機器學習和人工智慧演演算法提取、整合、分析海量金融資訊,讓 AI 為金融各領域賦能。
公司在 2017 年 12 月創立,獲紅杉中國基金獨家數千萬元融資。創始人之一李紀為是斯坦福大學計算機專業歷史上第一位僅用三年時間就獲得博士的人。過去三年 Google ScholarCitation>1,800,h-index達21。公司碩士以上比例為 100%,博士佔比超 30%,成員皆來自斯坦福、MIT、CMU、Princeton、北京大學、清華大學、人民大學、南開大學等國內外知名學府。
全職或實習加入香儂研發團隊,請點選檢視香儂招聘貼
簡歷投遞郵箱:hr@shannonai.com
Tips:聽說在郵件標題中註明“PaperWeekly”,能大大提升面試邀約率
英文采訪稿
ShannonAI: As you are now editing the third version of the text book “Speech and Language Processing”, the most widely used NLP text book, what is your general feeling about the changes in the NLP field over the past few years? What’s the most exciting part? Or if there is any, what’s the most disappointing part?
Jurafsky: This is such an exciting time to be in the field! I’m of course especially excited by deep learning. What makes me most enthusiastic is the potential for big changes to natural language generation, which is a field that has so much potential but was really marginalized in NLP for so long. And also the ability of modern embeddings (and contextualized embeddings!) to help us model the dynamics of word meaning (across time, across geography, across sentences and texts). The second thing I’m really excited about is the rise of social awareness in the field; both the realization that our models have biases, and their potential to be used to model and understand human social interaction and improve our world.
ShannonAI: Many of the NLP researchers have strong or even come from the linguistic background. As deep learning methods become more and more dominating, there is a voice or even a trend that linguistic knowledge is no longer a necessity in doing NLP research. Blindly training a bi-directional LSTM suffices. Could you comment on this point? Furthermore, could you comment on Frederick Jelinek’s famous quote “every time I fire a linguist, the performance of the speech recognizer improves”?
Jurafsky: I’m a big believer that to make contributions to a field, a domain of knowledge, it helps to know something about the domain. So I think it will continue to be crucial for NLP folks to have a deep understanding of how language works, and familiarity with linguistic phenomena, things like reference, compositionality, variation, grammatical structure, connotation and affect, style, conversational interaction. But understanding language and linguistic phenomena doesn’t mean blindly copying inappropriate linguistic models. The Jelinek quote (which Fred told me he phrased more diplomatically as “Anytime a linguist leaves the group the recognition rate goes up”) was actually about pronunciation modeling in speech recognition; it turned out then, and it’s still true now, that given sufficient data, machine learning is simply a better solution for modeling phonetic variation than hand-written phonetic rule cascades.
So I think the field will continue to be what it has been: a beautiful integration of machine learning and linguistic structure and knowledge, and each individual researcher will weight these two important components in different amounts at different times.
ShannonAI: From a historical perspective, major breakthroughs usually took place first in speech processing, and then spread to NLP. For example: in the early 1990s Peter Brown and Robert Mercer were from the speech field. They introduced statistical machine learning models to the NLP field which revolutionized the NLP field; Deep learning methods brought breakthroughs first in the speech domain by Deng et al. from Microsoft in 2012, whereas the usage of deep learning at scale in NLP was as late as 2013-2014. Inretrospect, do you have a theory why this happens, or if it is just a coincidence?
Jurafsky: Yes, it’s true both that statistical models came to NLP from speech, and that deep learning came to NLP from speech and also from vision. I believe that this is not a coincidence at all, that in general innovations come from interstices, when people who work on different areas are thrown together. Studies of Nobel Prize winners have shown that they tend to be bridge-builders, people who connect methods from different fields. So my advice to young scholars is to draw on interdisciplinary ties, talk to your neighbors in different areas. That’s the way in which breakthroughs happen!
ShannonAI: You did 3 years of research on speech processing when you were doing your postdoc. Could you describe how these years of research influenced your NLP research career?
Jurafsky: It was so influential. My postdoc was in 1992-1995, which is exactly when machine learning, probability theory and graphical models, neural networks, and early versions of embeddings were all simultaneously entering NLP, and I was lucky enough to get a postdoc at ICSI Berkeley in a speech recognition and neural network lab, working with Nelson Morgan and Jerry Feldman. That lab had a huge impact on me, and I was very influenced by my advisors in taking a “big tent” view of the NLP field, the idea that you had to think about text, speech, dialogue, and think about cognitive modeling and science just as much as engineering.
We didn’t know at the time whether it would be machine learning in general, or specifically graphical models or in fact neural networks that would become the dominant paradigm. And of course neural networks were so much slower then—no commercially available GPUs—so the lab had to build its own special vector co-processor boards, and what we thought of as enormous speech recognition nets—with one 4000-unit hidden layer— still took forever to train. If you had asked me to guess, I don’t think I would have predicted the 2-decade delayed arrival of deep learning. Amusingly, the first edition of the Jurafsky and Martin textbook described speech recognition only using neural networks; for the second edition we took nets out and described discriminative training of Gaussians instead; and now in the third edition we’re putting the neural networks back!
ShannonAI: In the past you and your students have done a lot of work using NLP techniques to address important issues in social sciences (e.g., Garg et al. PNAS 2018; Voigt et al. PNAS 2017, Winner of Cozzarelli Prize). What are some recommendations you would like to give to NLP researchers if they want to do more interdisciplinary research like these?
Jurafsky: Talk to social scientists! I think it’s so important if you’re going to study anything related to humans to collaborate with experts on humans!! Not only are social scientists trained much more to think about people and social relations, but, compared to computer scientists, they often get far more training in statistical and causal inference. And, once again, its interdisciplinarity that leads to innovation!
ShannonAI: In recent years, there has been a lot of concerns about the biases in machine learning. This issue seems to be particularly potent in the field of NLP, since data collected in naturalistic settings (e.g., twitter) inevitably contains biases. Training deep neural networks to blindly learn from those data would result in biased model predictions. How should we address this problem? (Alternatively, you could answer a related question: how can we use NLP for social good?)
Jurafsky: Yeah, this problem is a side-effect of a really exciting development, which is that NLP tools are finally being used by millions (billions?) of people every day, for MT, IR, recommendations, and soon! So the ethical aspects of our work finally have real consequences, and the young practitioners in the field, as well as the consumers of our science and technology, are really stepping up to look at these consequences.
I’m excited that we’re finally addressing these issues! Maybe we can learn from the tough ethical and social challenges that other fields—medicine, nuclear physics, biology, social science— have long had to learn to wrestle with. You asked what we should do when accuracy trades off with bias, and I think the answer is that we need to always be asking ourselves what is the true higher goal of our work, and what we’re realizing now is that it’s not just about optimizing for accuracy or for speed, but for truly making the world a better place. That’s a vague answer, and has to be contextualized for any individual algorithm or task, but let’s hope we succeed!
關於PaperWeekly
PaperWeekly 是一個推薦、解讀、討論、報道人工智慧前沿論文成果的學術平臺。如果你研究或從事 AI 領域,歡迎在公眾號後臺點選「交流群」,小助手將把你帶入 PaperWeekly 的交流群裡。

▽ 點選 | 閱讀原文 | 加入社群刷論文
 知識星球
知識星球