2009/03/31

clustering by tree distance for parse tree normalisation

directed study module

Aim of this directed study module

In various areas of what might be termed intelligent language processing, the question of categorising units and of ranking units for similarity arise.

In Information Extraction, from sentences, units have to recognised that can be seen as filling roles in database-entries in some domain you are particularly interested in (business events, terrorist incidents). So units have to be found and given categories such as companydateweapon etc. Sometimes systems for such categorisations are called Named Entity Recognisers

In Question Answering, given a natural language question (eg. what command will copy a directory?), an answering natural language sentence is sought within a corpus of natural language text (such as a manual: command cp Dir1 Dir2 will copy the directory Dir1 to location Dir2). Roughly this can be seen as categorising the corpus sentences into the 2 categories of being an answer and not being an answer. More realistically it can be seen as ranking the corpus sentences for their likelihood to be answer to the question, and this ranking might take the form of computing a similarity between the answer and the question. Often within an overall question-answering system there are sub-systems whose concern is the categorisation of phrases. So units within the answer-corpus might be sought and categorised (such as locationperson etc), and similar or identical categories might be assigned to questions (where was ...:location, who ...:person) and might be used to influence the selection or ranking of answers.

The techniques could be (and have been) used for such categorisation and similarity-ranking tasks might be classified along the following two dimensions

Structural
whether the method makes reference to linguistic structures or not (for example methods referring pretty much to just word-count statistics do not make reference to linguistic structures)
Machine-Learning
whether the method is hand-engineered, or instead is a method which uses training data to define the parameters and behaviour of a classifier.

The aim of this directed study module will be to look at techniques for categorisation and similarity ranking which (i) do make reference to linguistic structures and (ii) do make use of machine-learning methods, whilst making comparisons with techniques which do not share these characteristics.

--------------


Possible Plan

Within the area of techniques which do make reference to linguistic structure, and are not hand-engineered, the main contendors are

Tree Distance
defining a 'distance' $d(S,T)$ between two trees $S$ and $T$, basically representing what is left-out and relabelled in an ancestry and linearity preserving mapping between $S$ and $T$. For identical trees, the distance $d(T,T)$ is 0. [PRtY04,ZSW+05,KM05,Emm06a,Emm06b]

Tree Kernels
defining a so-called 'kernel' function $k(S,T)$ between two trees $S$ and $T$, which effectively represents a vector product $\sigma(S) \bullet \sigma(T)$ on images of the trees projected into a space, each dimension of which represents a distinct substructure. 'kernel' functions are often thought of as 'similarity' functions: for identical trees, the 'similarity' $k(T,T)$ is large, and for trees with no shared substructures, $k(S,T)$ is 0. [ZAR03,ZL03,QMMB07,MGCB05,BM07]

Support Vector Machine Classification
using training data on which a kernel function $k$ is defined and which are in 2 categories, a hyperplane is effectively found, defined by a normal vector w and an origin offset $b$, dividing the space of points according to


0 \mbox{\ \ \ \ \ \ } {\bf w} \bullet {\bf x} + b <>

Typically w is not directly found, but instead the quantity ${\bf w} \bullet {\bf x} + b$ is expressed by a summation involving the kernel function $k$ and a finite subset of training points, the so-called support vectors

k-NN classification
assuming a distance function $d$ which can be calculated between a test item and any member of a set of classified training data, rank the training data in order of increasing distance from the test item. In 1-NN classication, return as the category of the test item the category of its nearest neighbour. In k-NN classication, find the $k$ nearest neighbours, and treat this as a panel of $k$ votes for various categories, finally applying some weighted voting scheme to derive a winning category from the panel.

I would propose a plan in which first materials are studied to make sure that the above 4 concepts are reasonably well understood, followed by particular papers in which various combinations of these concepts have been applied to categorisation and ranking tasks.

Here's a first (and still unfinished) draft of the papers/materials to be looked at

k-NN and distance functions

WHAT TO READ ? STILL DECIDING

SVMs and tree kernels

WHAT TO READ ? STILL DECIDING

Question Categorisation

ZL03
Dell Zhang and Wee Sun Lee. 
Question classification using support vector machines. 
In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26-32, New York, NY, USA, 2003. ACM.

QMMB07
Silvi Quaternoni, Alessandro Moschitti, Suresh Manandhar, and Roberto Basili. 
Advanced structural representations for question classification and answer re-ranking. 
In Advances in Information Retrieval, proceedings of ECIR 2007. Springer, 2007.

Question Answering and Text Entailment

PRtY04
Vasin Punyakanok, Dan Roth, and Wen tau Yih. 
Natural language inference via dependency tree mapping: An application to question answering. 
Computational Linguistics, 2004.

KM05
Milen Kouylekov and Bernardo Magnini. 
Recognizing textual entailment with tree edit distance algorithms. 
In Ido Dagan, Oren Glickman, and Bernardo Magnini, editors, Pascal Challenges Workshop on Recognising Textual Entailment, 2005.

Emm06b
Martin Emms. 
Variants of tree similarity in a question answering task. 
In Proceedings of the Workshop on Linguistic Distances, held in conjunction with COLING 2006, pages 100-108, Sydney, Australia, July 2006. Association for Computational Linguistics.

Relation Extraction

ZAR03
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 
Kernel methods for relation extraction. 
Journal of Machine Learning Research, 3:1083-1106, 2003.

ZSW05
Min Zhang, Jian Su, Danmei Wang, Guodong Zhou, and Chew Lim Tan. 
Discovering relations between named entities from a large raw corpus using tree similarity-based clustering. 
In IJCNLP, 2005.

MGCB05
Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura Coppola, and Roberto Basili. 
Hierarchical semantic role labeling. 
In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005 shared task), 2005.

BM07
Stephan Bloehdorn and Alessandro Moschitti. 
Combined syntactic and semantic kernels for text classification. 
In Gianni Amati, Claudio Carpineto, and Gianni Romano, editors, Advances in Information Retrieval - Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 2-5 April 2007, Rome, Italy, volume 4425 of Lecture Notes in Computer Science, pages 307-318. Springer, APR 2007.

------------------

Bibliography

BM07
Stephan Bloehdorn and Alessandro Moschitti. 
Combined syntactic and semantic kernels for text classification. 
In Gianni Amati, Claudio Carpineto, and Gianni Romano, editors, Advances in Information Retrieval - Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 2-5 April 2007, Rome, Italy, volume 4425 of Lecture Notes in Computer Science, pages 307-318. Springer, APR 2007.

Emm06a
Martin Emms. 
Clustering by tree distance for parse tree normalisation. 
In Proceedings of NLUCS 2006, pages 91-100, 2006.

Emm06b
Martin Emms. 
Variants of tree similarity in a question answering task. 
In Proceedings of the Workshop on Linguistic Distances, held in conjunction with COLING 2006, pages 100-108, Sydney, Australia, July 2006. Association for Computational Linguistics.

KM05
Milen Kouylekov and Bernardo Magnini. 
Recognizing textual entailment with tree edit distance algorithms. 
In Ido Dagan, Oren Glickman, and Bernardo Magnini, editors, Pascal Challenges Workshop on Recognising Textual Entailment, 2005.

MGCB05
Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura Coppola, and Roberto Basili. 
Hierarchical semantic role labeling. 
In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005 shared task), 2005.

PRtY04
Vasin Punyakanok, Dan Roth, and Wen tau Yih. 
Natural language inference via dependency tree mapping: An application to question answering. 
Computational Linguistics, 2004.

QMMB07
Silvi Quaternoni, Alessandro Moschitti, Suresh Manandhar, and Roberto Basili. 
Advanced structural representations for question classification and answer re-ranking. 
In Advances in Information Retrieval, proceedings of ECIR 2007. Springer, 2007.

ZAR03
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 
Kernel methods for relation extraction. 
Journal of Machine Learning Research, 3:1083-1106, 2003.

ZL03
Dell Zhang and Wee Sun Lee. 
Question classification using support vector machines. 
In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26-32, New York, NY, USA, 2003. ACM.

ZSW+05
Min Zhang, Jian Su, Danmei Wang, Guodong Zhou, and Chew Lim Tan. 
Discovering relations between named entities from a large raw corpus using tree similarity-based clustering. 
In IJCNLP, 2005.



Martin Emms 2009-02-09



Richard Dawkins critica la intolerància religiosa, en rebre l’honoris causa de la Universitat de València

Web Information Retrieval: Spam Detection and Named Entity Recognition

2 hours seminar

by Marcin Sydow

Query decomposition

 

Topical query descomposition

 

Query in red

Do blue set

 

Quality of decomposition

Cost: low topical scattered

Red faction low `(small quantity of documents)

Inter-query overlap (well descompose)

Uncovered (try to cover all the query).

 

Annealing function

Solutions are neighbors if they differed by 1 query.  

 

Properties:

Quasi-transitive (temperature preference)

Utility function

Social preferences

Majority voting: is not transitive

Single peakedness

Arrows social welfare function

 

We like:

Arrow’s impossibility theorem.

2009/03/30

english

studing english, 
phrasal verbs
checking who to explain things...

reading group

borderline-SMOTE:A New over-sampling Method in Imbalanced Data Sets Learning.
by me.

note:
my supervisor assist to the first part of the explanation.

searching books for spanish lectures

2009/03/29

Tree Kernels for semantic role labeling
reading paper

Tree Kernels for semantic role labeling

Tree Kernels for semantic role labeling (32 pages)

0.   Abstract

This paper is based on kernel functions to model parse tree properties in kernel based machines.

1.   Introduction

Recently attention: automatic labeling of semantic roles. It requires:

1. Detection of the target predicate

2. Detection and classification of constituting arguments.

Previous work:

-          Machine learning

-          Linking theories semantic and syntax

Tree kernel for:

-          Parsing re-ranking

-          Relation extraction

-          Name entity recognition.

Recent work with high labeling accuracy,

Joint inference on predicate-argument structure. For this -> extract features from sentence syntactic parse tree.

1-      Feature extraction.

2-      Combination with traditional attribute value models.

3-      Feature engineering using kernel combinations and marking strategies.

4-      Tree kernels….

5-      Identify the most important structured features: highest-weighted features -> better classifiers.

“Tree kernels should always be applied”.

Limitations:

a)      Poor impact on boundary detection.

b)      Limited contribution to overall accuracy.

Tree kernels for srl: types:

-          Canonical mappings

-          Feature extraction functions

Carefully engineered tree kernels always increase accuracy.

 

2.   Automatic shallow semantic parsing

3.   Tree kernels

4.   State of the art architecture for SRL

5.   Structured feature engineering

6.   Experiments

7.   Discursion and conclusions

estudiando ingles

2009/03/28

Semantic Role Labeling: An Introduction to the Special Issue

®  Sentence + designated verb: SRL identify the arguments of the verb. And label them with semantic roles.

®  Steps:

¯  1 Pruning.

¯  2 Local scoring

¯  3 Joint scoring

(4) Fixing common errors

 

Pruning

®  Filtering the set of arguments for a given predicate.

¯  Any subsequence of words in the sentence is an argument candidate.

®  Xue and Palmer (2004)

Joint scoring      

®  Good structure of labeled arguments.

®  Arguments do not overlap,

®  Core arguments do not repeat, etc.

®  Re-ranking

®  Probabilistic methods.

Conditional random fields

SRL architecture:

®  Combination of systems and input annotations.

¯  Increase robustness,

¯  Gain coverage

¯  Reduce effects of parse errors

®  One my combine:

¯  Output of independents srl basic systems

¯  Outputs from same srl s. changing input annotations or parameters.

Gaing of 2-3 F1 points.  

®  Joint labeling…

®  Dependency parsing

®  Combine parsing and srl in a single step. …

®  Characterize candidate argument
phrase type, headword, …

®  Characterize verb predicate +cntx
lemma, voice,

®  Characterize the relation.
Syntactic + semantic.
Left/right position of the constituent  with respect to the verb…

®  Recall:

®  81% argument identification

®  95% assigned correct semantic role.

®  SemEval-2007

®  Disambiguation of 50 verbs.

®  FrameNet

®  40 frames

®  F1=92% asigning semantic roles

®  F1 83% segmenting and labeling arg.

®  Complete analysis of semantic roles on unseen texts,

®  Precision 60s

®  Recall 30s

®  SRL relies on syntactic structure.

®  Output by statistical parser 90% matching.

®  Is common to use parser trees.

®  Gold-standard trees.???

®  Most of the errors are by having incorrect syntactic constituents.

®  SRL relies on syntactic structure.

®  Output by statistical parser 90% matching.

®  Is common to use parser trees.

®  Gold-standard trees.???

®  Most of the errors are by having incorrect syntactic constituents.

®  CoNLL-2005 Brown corpus annotated

®  Performance drop below 70%

®  They clamed errors are in assigning the semantic roles rather than identification of argument boundaries.

®  Spanish and catalan, CESS-ECE corpus.

®  86% disambiguation predicates

®  83% labeling arguments.

®  chinese

®  semEval-2007 25K SENTENCE

®  80% core elements.

Top performing team use machine learning techniques

2009/03/27

helping COST2102

2009/03/26

helping cost
burocracy
reading papers

2009/03/25

borderline-smote
slides ready for the reading group
PREPARING slides
helping COST 2102

2009/03/23

helping carl in cost

Semantic role labeling: an introduction to the special issue

 

1 Introduction SRL:

APLICATIONS:

Information extraction

Question ansering

Summarization

MT

NLP tasks

major conclusion of that work is that the patterns of syntactic alternation exhibit regularity that reflects an underlying semantic similarity among verbs, forming the basis for verb classes major conclusion of that work is that the patterns of syntactic alternation exhibit regularity that reflects an underlying semantic similarity among verbs, forming the basis for verb classes ?? what it means?

 

2  Semantic roles in linguistics

 

3 From Linguistic Theory to computational resources:

2002 first statistical machine learning approach to SRL

Verb lexicon

Proposition bank

Each verb has a framset listing its allowed role labeling.  

Arg0, arg1 -> to be specific with semantic roles

Abstract semantic role: agent, theme, location.

4 Approaches to automatic SRL:

Only supervised approach.

4.1 SRL Step by Step

First step: filtering or pruning.

The second step consists of a local scoring of argument candidates: argument identification and classification.

The third : joint scoring.

 ----------

FRAME SEMANTICS: 

Syntactic-oriented arguments and adjuncts

frame semantics core and peripheral.

...


posible topics :

 

2009/03/21

start to read 
semantic role labeling: an introduction to the spe3cial issue
helping carl to prepar COST2102

2009/03/20

we decided to work on semantic role labeling
by spanish book for spanish lectures + return books

2009/03/19

TCD
2 PART
machine learning

2009/03/18

ACM news

New System for Improving Decision Support Systems
Universidad Politecnica de Madrid (03/13/09) 

Universidad Politecnica de Madrid School of Computing researchers have developed a system designed to improve decision-making processes in complex situations. The system was tested on the restoration of Lake Svyatoye in Belarus, which was contaminated by the Chernobyl accident. Professors Antonio Jimenez, Alfonso Mateos, and Sixto Rios, from the Department of Artificial Intelligence's Decision Analysis and Statistics Group, aimed to account for incomplete information and any possible effects those gaps could have on decision making. Multi-Attribute Utility Theory is often used to solve decision-making problems. The theory says that after building a hierarchy of objectives and identifying a set of alternatives and each alternative's value for impact on the objectives, the decision maker's preferences are quantified. The new system uses two approaches to manage incomplete information, which occurs when the impacts of some alternatives and attributes are unknown. The first approach redistributes criteria weights with missing values or impacts throughout the objectives hierarchy and across other criteria, which means the criteria hierarchy and its assigned weights vary when each alternative is analyzed depending on the criteria with missing values. The second approach associates the citerion range, the set of possible values, as the impact for a criterion with missing values, which means the entire range of values are considered possible and equally likely.

Machine learning

A tentative table of contents:

  • Bayesian Decision Theory
    • minimum error rate classification
    • discriminant functions and decision surfaces
  • Parametric models and parameter estimation
  • Non-parametric techniques
    • K-Nearest neighbors classifier
    • Decision trees
  • Linear models
    • Perceptron
    • Logistic regression (Maxent)
    • Large margin and kernel methods
  • Generative versus discriminative modeling
  • Sequence labeling and structure prediction
    • Hidden Markov and MaxEnt Markov Models
    • Sequence perceptron
    • Conditional Random Fields
    • Learning + Inference models

2009/03/17

summer schools

2 hours searching

start

finish

duration

Price:

web

1

pisa

italy

Evaluation, Best Practices and Collaboration for Multilingual Information Access

15-Jun

19-Jun

4


   200.00

http://www.trebleclef.eu/summerschool_registration&grants.php

only tutation fees

GRANT possibility

2

Bordeaux

france

18-Jul

02-Aug

15


   225.00

http://esslli2009.labri.fr/

ACEPTED

3

belfast

Uk

artificial intelligence

23-Aug

29-Aug

6


   340.00

http://193.61.130.114/acai09/index.html

recommendation letter

4

cambridge

UK

Machine learning

29-Aug

10-Sep

12


 900pounds

http://mlg.eng.cam.ac.uk/mlss09/registration.htm

recommendation letter, all included

5

Petrozavodsk

russia

Information retrival

11-Sep

16-Sep

5


        0.00

http://romip.ru/russir2009/eng/registration.html

padua

Italy

Information retrival

31-Aug

04-Sep

4


 500

http://essir2009.dei.unipd.it/

 

Information Retrieval ESSIR2009

at the same time as in cambridge


Registration: Apr 30, 2009

School: Aug 31 - Sep 4, 2009


Introduction

The European Summer School in Information Retrieval (ESSIR) 2009 gathers in Padua, Italy lectures and satellite meetings for exchange and dissemination in the modeling, design and implementation of advanced systems for the representation, storage, search and retrieval of information.

ESSIR 2009 will be the 7th European School in Information Retrieval. For more information on the past editions, check the History of ESSIR.


Location

Padua is a pleasant historical city, home to one of the oldest and most prestigious Universities in Europe. Now little more than 400 years ago, Galileo, came here as a Professor of Mathematics to spend, in his own words, "the 18 best years of my entire age". Venice is only 15 miles away, connected by frequent buses and trains. Bologna or Verona, Florence or Milan, Rome or Turin, are conveniently reached by train in one, two, four hours, respectively.


Speakers

M. Agosti, R. Baeza-Yates, N. Fuhr, A. Gionis, P. Ingwersen, M. Lalmas, M. Melucci, J.Y. Nie, S. Robertson, S. Rüger, I. Ruthven, M. Sanderson, J. Shanahan, C.J. van Rijsbergen, H. Zaragoza.


Topics

Foundations of Information Retrieval, Indexing Techniques, Formal Models, Evaluation of Information Retrieval Systems, Users and Context, Multimedia and Multilingual Retrieval, Digital Libraries, Semi-structured Data, Machine Learning, Distributed Search, Advertising and Mining on the World Wide Web.


Chairs

2009/03/16

Get enrroled
21st European Summer School in Logic, Language and Information
Bordeaux July 20-31, 2009
study statistics

2009/03/15

HCI in TED



Human computer interaction

2009/03/14

updating blog with keywords from the last two days conferences 

Grammar Induction and Iterated Learning

 Grammar Induction and Iterated Learning.

 

Dr. Willem (Jelle) Zuidema from the Institute for Logic, Language and Computation, University of Amsterdam

 

 

 

key words:

bayesian learning

signal-> concept

 

Grammar induction:

Merging and chinking non terminals

Chinking repeats the pattern.

 

The correct secuence can reproduce any free context grammar.

 

PTSG Probabilistic Tree substitution Grammars

DOP grammar , assign all possible binary structure

 

 

 

 

 

 

Iterative learning

Cause:

Mechanism

Ontogeney -> Bayesian

Functions

Phylogeny :  evolution

 

Grammar -> language -> grammar -> language -> …

Posteriors

Likehood

Prior

Probability of the data.