The 2021 Alonzo Church Award for Outstanding Contributions to Logic and Computation is given jointly to Georg Gottlob, Christoph Koch, Reinhard Pichler, Klaus U. Schulz, and Luc Segoufin for their fundamental work on logic-based web data extraction and querying tree-structured data
The ACM Special Interest Group on Logic (SIGLOG), the European Association for Theoretical Computer Science (EATCS), the European Association for Computer Science Logic (EACSL), and the Kurt Goedel Society (KGS) are pleased to announce that Georg Gottlob, Christoph Koch, Reinhard Pichler, Klaus U. Schulz, and Luc Segoufin have been selected as the winners of the 2021 Alonzo Church Award for Outstanding Contributions to Logic and Computation for fundamental work on logic-based web data extraction and querying tree-structured data, published in:
- Georg Gottlob and Christoph Koch. Monadic Datalog and the Expressive Power of Languages for Web Information Extraction. Journal of the ACM (JACM) 51.1, pp. 74-113, 2004, DOI: 10.1145/962446.962450.
- Georg Gottlob, Christoph Koch, and Klaus U. Schulz. Conjunctive Queries Over Trees. Journal of the ACM (JACM) 53.2, pp. 238-272, 2006, DOI: 10.1145/1131342.1131345.
- Georg Gottlob, Christoph Koch, and Reinhard Pichler. Efficient Algorithms for Processing XPath Queries. ACM Transactions on Database Systems (TODS) 30.2, pp. 444-491, 2005, DOI: 10.1145/1071610.1071614.
- Georg Gottlob, Christoph Koch, Reinhard Pichler, and Luc Segoufin. The Complexity of XPath Query Evaluation and XML Typing. Journal of the ACM (JACM) 52.2, pp. 284-335, 2005, DOI: 10.1145/1059513.1059520.
The Contribution
Paper (1) establishes a comprehensive logical theory of Web data extraction. At its core, this is the problem of selecting relevant nodes (subtrees) from HTML text. While the set of relevant nodes can be expressed in Monadic Second-Order logic (MSO) over finite trees, MSO has high computationally complexity. The authors prove that Monadic Datalog on trees has exactly the same expressive power as full MSO and that, surprisingly, evaluating Monadic Datalog is feasible in time linear in the size of query and input tree. These results greatly influenced theoretical and applied research, and gave rise to logic-based systems for data extraction that have been successfully used in industry.
Papers (2,3,4) present deep investigations into logical queries over tree-structured data. The complexity of evaluating XPath, a key technology in Web browsers and other systems, was unclear, and available implementations required exponential time. Paper (2) gives a full characterization of, and a dichotomy theorem for, the complexity of conjunctive queries on various representations of trees. Paper (3) shows that the full XPath standard can be evaluated in PTIME and proposes a logical core which has become seminal to research efforts at the intersection of Web data processing and (modal) logics. Finally, paper (4) establishes the precise complexity of evaluating XPath fragments.