skip to Main Content
Object Extraction From Presentation-oriented Documents Using A Semantic And Spatial Approach
Ermelinda Oro, Massimo Ruffolo

Tipo: Brevetto di invenzione industriale
Titolo: Object extraction from presentation-oriented documents using a semantic and spatial approach
Anno di pubblicazione: 2017
Tipo di brevetto: Internazionale
Numero: US9582494 B2
Nazione/i di deposito: US
Lingua: inglese

Descrizione: Automatic extraction of objects in a presentation-oriented document comprises receiving the presentation-oriented document (POD) in which content elements are spatially arranged in a given layout organization for presenting contents to human users; receiving a set of descriptors that semantically define the objects to extract from the POD based on attributes comprising the objects; using the set of descriptors to identify content elements in the POD that match the attributes in the set of descriptors defining the objects, and assigning semantic annotations to the identified elements based on the descriptors; creating a semantic and spatial document model (SSDM) containing spatial structures of the identified content elements in the POD and the semantic annotations assigned to the identified contents elements; extracting the identified content elements from the POD based on the set of descriptors and the SSDM to create a set of object instances; and performing at least one of: i) using the object instances to generate semantic and spatial wrappers that can be reused on a different POD, and ii) storing the object instances in a data repository.

Parole chiave:

  • Information Extraction
  • Presentation-oriented document
  • semantic method
  • spatial approach


Back To Top