ACL 2019 Conference Summary

My colleague Ananda and I attended ACL 2019 conference at the enchanting city of Florence. All the accepted papers can be accessed here. Here’s the summary of interesting trends and also specific research work that caught my eye at the conference. A note of thanks to my employer at Zoho for sponsoring us to attend.

I wrote this summary an many months ago and forgot posting it. Better late than never I guess.

Grammatical Error Correction

Among the ACL workshops, Building Educational Applications (BEA) Workshop had a Grammar Error Correction competition.

The system description papers for this competition were presented as posters in the conference.

Three tracks were present in the competition. Restricted track - Only organizer provided human labelled parallel (error and corrected sentence pairs) data can be used. (No restriction on synthetic data) Unrestricted track - Any data including private data can be used. Low Resource track - No human labelled data can be used.
Interestingly, the winning team (Edinburgh + Microsoft)’s submission for Track 1 also beat Track 2 without using additional restricted data.
Synthetic data generated by corrupting good grammatical sentences from news, books and wikipedia are the techniques used overall by top performing teams.

Multi-Lingual Models

MultiLingual models is a hot area of research now. Earlier results where using single model to perform tasks on multiple languages has shown promising results.

Lots of papers on multi-lingual shared models were presented.
Paper - Choosing Transfer Languages for Cross-Lingual Learning

Rise of Automated Metrics

Until recently, we compare model outputs with human written sentences for translation, summarization etc. This can artificially penalize models that generate sentences with equivalent meaning but not same words. There are couple of papers that train models to score quality of the output. Then use these model scores as reward for reinforcement learning. (FYI reinforcement learning is only used for fine tuning, none of the seq2seq models can be trained from scratch using it)

Paper - This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation This paper uses automated score instead of typical NGram match (ROUGE) score for summarization task.
Paper - Beyond BLEU:Training Neural Machine Translation with Semantic Similarity
Paper - Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts

Statistical Evaluation

If we have two architectures and couple of datasets, how to say empirically one is better than the other? Few questions are how to compare two models on the same dataset, across multiple datasets, across various hyperparameter configurations. Problems in applying frequentist tests on the metrics such as accuracy, f1-score etc are that assumptions such as Independent and Identically distributed (IID) cannot be made for deep learning datasets. So we cannot assume that the score the model gets in one dataset is “independent” of the score on another dataset. Statistical tests that don’t assume underlying distribution are needed. Recent statistical methods/tests to do so are being developed and some were presented at the conference.

Bayesian Methods

Attended a very detailed tutorial on it. The presenter has summarized the evolution of research in this area and the current papers. Here’s link to the detailed slides for fellow Bayesians.

Analyzing Neural Nets and Interpretability

There is an entire sub-fields of research into analyzing and interpreting neural networks.

BERTology

“BERT-ology” papers that explore what linguistic structures do pre-trained models like BERT learn.

Paper - What Does BERT Look at? An Analysis of BERT’s Attention

BlackBoxNLP Workshop

An entire workshop devoted for analyzing what Neural Networks learn.

Paper - On the Realization of Compositionality in Neural Networks Interesting paper studying what is required for neural models to compose two very trivial functions.
Paper - GEval: Tool for Debugging NLP Datasets and Models

Formal Languages Workshop

An entire small workshop devoted to finding what Formal Languages (Finite state Automata, etc) neural networks can learn. e.g. Can we reduce a RNN to Weighted Finite State Machine (which is far more interpretable, amenable to theory etc). Although this area sounds exciting to me, I was unable to attend it as I was in an another workshop. Slides from talk of Noah Smith’s talk on Rational Recurrences at this workshop.

Neuroscience and NLP

Neuroscience labs have started to use deep learning. An interesting conjunction of research in NLP and neuroscience research in correlating ANN representations with brain signals was presented.

Paper - Relating Simple Sentence Representations in Deep Neural Networks and the Brain The researchers try to find relationship between deep learning language representations and brain signals. Paper of interest is where they predict neural brain patterns using pre-trained ANN models like BERT.

Language Emergence in Multi-Agent systems

In this frontier, people try train models to solve some task by communicating symbols. Researchers analyze the properties of language used by the agents to solve the task and how it compares with properties of human language.

Paper - Word-order Biases in Deep-agent Emergent Communication

Conversational AI

Neural Models for selecting conversation from past history, detecting intent and slot fitting are all increasingly being deployed by companies.
PolyAI (a startup at Singapore shipping conversational AI) shared three interesting papers. Their slides are also interesting.
On a related note, Baidu has is doing impressive research and engineering on meeting transcription. They have a stack that does speech to text, translating the text as its spoken (a problem that needed separate research as the text would be incomplete), detecting english phrases being spoken (code switching) and then NLP over the transcribed text.

Translation

Lots of new work on adapting translation models for low-resource languages.
Unsupervised translation, Multi-lingual translation models are few areas of research.
Unbabel a YC funded startup doing translation systems shared lots of interesting and important results. Slides from their talk. This company employs a hybrid system where human translators do “post-edits” on machine translations. And some of their system work in real-time.

Contextual Search using Neural Representations at scale

This paper has demonstrated a system which does dense vector search on entire wikipedia for open domain QA.

Scaling search on neural vectors to do question answering on entire wikipedia on CPU - https://github.com/uwnlp/denspi

Demo - http://allgood.cs.washington.edu:15001/