Building multilingual semantic parsers using undirected graphical models
Abstract: The task of answering natural language questions over RDF data hasreceived wIde interest in recent years, in particular in the context of the seriesof QALD benchmarks. The task consists of mapping a natural language ques-tion to an executable form, e.g. SPARQL, so that answers from a given KB canbe extracted. So far, most systems proposed are i) monolingual and ii) rely ona set of hard-coded rules to interpret questions and map them into a SPARQLquery. We present the first multilingual QALD pipeline that induces a modelfrom training data for mapping a natural language question into logical form asprobabilistic inference. In particular, our approach learns to map universal syn-tactic dependency representations to a language-independent logical form basedon DUDES (Dependency-based Underspecified Discourse Representation Struc-tures) that are then mapped to a SPARQL query as a deterministic second step.Our model builds on factor graphs that rely on features extracted from the depen-dency graph and corresponding semantic representations. We rely on approximateinference techniques, Markov Chain Monte Carlo methods in particular, as wellas Sample Rank to update parameters using a ranking objective. Our focus lies ondeveloping methods that overcome the lexical gap and present a novel combina-tion of machine translation and word embedding approaches for this purpose. Asa proof of concept for our approach, we evaluate our approach on the QALD-6datasets for English, German & Spanish.