Data-Driven Linguistic Ontology Development

Funded by the NSF (#0411348)

 

William Lewis, CSU Fresno, PI

Scott Farrar, Universität Bremen, Contractor

 

The Supplement

 

In December 2004, a supplement to the DDLOD grant was approved by NSF.  The purpose of the supplement is to significantly increase the amount of IGT being mined off the Web, and to make the actual instances of IGT accessible to and searchable by the linguistics community.  Instances of IGT discovered on the Web will be migrated to a best-practice XML format and housed in the Online Database of INterlinear, or ODIN.  Each instance of XML encoded IGT will contain the fully aligned source data, with pointers to source documents, and will include full citations where possible.  The semantics of grammatical notions will be resolved in terms of GOLD.  Initially, ODIN will be organized by language, which will enable searching using the OLAC search interface available at LinguistList (see http://www.linguistlist.org/olac/index.html). 

 

Our current plan of action is as follows:

  1. (Early Spring 2005) Links to resources that contain IGT will be provided through the OLAC search interface.  The lists of URLs will be searchable by language.

  2. (Late Spring 2005)   Actual IGT data will be incrementally brought online.  Data will be rendered in a traditional format, and will contain the URLs to the source documents.  Again, access will be provided through OLAC, and the data will be searchable by language.

  3. (Summer 2005). The search facility will be enhanced to allow searching by a limited set of mostly morphosyntactic grammatical notions in addition to language. (Currently being tested - 9/30/05.)

Check here again in the upcoming months as the work on this project unfolds.