2009/04/04

CLUSTERING BY TREE DISTANCE FOR PARSE TREE NORMALISATION:

CLUSTERING BY TREE DISTANCE FOR PARSE TREE NORMALISATION:

Writed by Martin Emms.

Notes by Hector Franco.

0 Abstract

Potential application: Transformation of interrogative to indicative sentences. -> is a step on question answering. | A tree distance is proposed -> find a pattern tree that summarizes the cluster. .

1 Introduction

Previous work:

Question-answering with tree-distance.

1 take a parse-structure from a question, 2 match it up to parse-structures of candidate answers.

Normalization: change passive structures to active structures: interrogative to indicative.

Popular parser: Collins probabilistic. Trained on Penn Treebank.

Trees are not assigned in accordance with any finite grammar.

Simple transformation -> mentally induction. (Very boring)

Method described:

Parse structures can be hierarchically clustered by tree-distance and kind of centroid tree for a chosen cluster can be generated which exemplifies typical traits of trees with the cluster.

 

2 Tree distance

Concepts:

Source and target trees

Preserve left to right order and ancestry.

Descendant.

(Not sense to summarize, just look the original).

2.1. question answering by tree distance.

Answers ranked according to the tree-distance from the questions.

QATD : question answering by tree distance.

Additional methods: query-expansion, query-type identification, named entity recognition.

Syntactic structures group items semantically related.

Syntactic structures might encode or represent a great deal that is not semantic in any sense.

 

Variances in tree distances:

Sub-tree: the cost of the least cost mapping from a sub-tree of the source.

Sub-traversal: the least cost mapping from a sub-traversal of the left-to-right post –order traversal of the source.

Structural weights: weights according to the syntactic structure.

Wild cards: can have zero cost matching, ???????????

Lexical Emphasis: leaf nodes have weights which are scaled up in comparison to nodes which are internal to the tree.

String Distance: if code source and target the string distance coincides with tree distance. ??????

Results:

Tree distance which uses sub-trees, weights , wild-cards and lexical emphasis, are better than sub-string distance and each parameter improve it.

???????

 

 

 

 

3 Clustering by tree distance

Used the agglomerative clustering algorithm: pic a pair of cluster with minimal distance and merge it into a single one.

Agglomerative coefficient: measure of overall quality.

S(q) cluster of q

Merge_dist(q) intercluster distance.

Agglomerative coefficient AC merge_dist/Df. 1 the best (0 to 1).

Giving different weight give a better results. (head/complement/adjunt/…)

4 Deriving a pattern tree form a cluster

How to seek the centre point of a cluster. (the one with minimal distance to the others).

Distance is Euclidean or cosine.

 

New function: aling_outcome( node I, paramb)

B =0 matched perfectly, b=1 substituted, b=2 deleted.

Used to derive an alignment summary tree, align_sum( c )

 

Final step:

Deletion nodes are deleted

Substitution nodes become wild-card trees.

 

5 conclusions and future work

Adaptations of tree distance improve question answering, and cluster quality.

 --finish--

No comments:

Post a Comment