PhD Dissertation Talk, Semantic Computing Group, CITEC, Bielefeld University, Germany
Abstract: The task of answering natural language questions over structured data has received wideinterest in recent years. Structured data in the form of knowledge bases has been availablefor public usage with coverage on multiple domains. DBpedia and Freebase are such knowl-edge bases that include encyclopedic data about multiple domains. However, querying suchknowledge bases requires an understanding of a query language and the underlying ontology,which requires domain expertise. Querying structured data via question answering systemsthat understand natural language has gained popularity to bridge the gap between the dataand the end user.In order to understand a natural language question, a question answering system needsto map the question into query representation that can be evaluated given a knowledge base.An important aspect that we focus in this thesis is the multilinguality. While most researchfocused on building monolingual solutions, mainly English, this thesis focuses on buildingmultilingual question answering systems. The main challenge for processing language inputis interpreting the meaning of questions in multiple languages.In this thesis, we present three different semantic parsing approaches that learn modelsto map questions into meaning representations, into a query in particular, in a supervisedfashion. Each approach differs in the way the model is learned, the features of the model, theway of representing the meaning and how the meaning of questions is composed. The firstapproach learns a joint probabilistic model for syntax and semantics simultaneously from thelabeled data. The second method learns a factorized probabilistic graphical model that buildson a dependency parse of the input question and predicts the meaning representation that isconverted into a query. The last approach presents a number of different neural architecturesthat tackle the task of question answering in end-to-end fashion. We evaluate each approachusing publicly available datasets and compare them with state-of-the-art QA systems.