Sunday, November 18, 2012

Comparison of Semantic Role Labeling (aka Shallow Semantic Parsing) Software

There are quite a few high-quality Semantic Role Labelers out there, I recently tried a few of them out and thought I'd share my experiences.

If you are unfamiliar with SRL, from wikipedia:
Semantic role labeling, sometimes also called shallow semantic parsing, is a task in natural language processing consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles. For example, given a sentence like "Mary sold the book to John", the task would be to recognize the verb "to sell" as representing the predicate, "Mary" as representing the seller (agent), "the book" as representing the goods (theme), and "John" as representing the recipient. This is an important step towards making sense of the meaning of a sentence. A semantic representation of this sort is at a higher-level of abstraction than a syntax tree. For instance, the sentence "The book was sold by Mary to John" has a different syntactic form, but the same semantic roles.
SRL is generally the final step in an NLP pipeline consisting of tokenizer -> tagger -> syntactic parser -> SRL. The following tools implement various parts of this pipeline, typically using existing external libraries for the steps up to role labeling.

I've provided sample output where possible for the sentence: "Remind me to move my car friday." Ideally, an SRL should extract the two roles (remind and move) and their proper arguments (including the temporal "friday" argument).

Without further ado...

Labelers

In order from most recent release to oldest:

Authors: Anders Björkelund, Bernd Bohnet, Love Hafdell, and Pierre Nugues
Latest release: Nov 2012
Comes with nice server/web-interface. Has trained models for english, chinese and german. Newer version w graph-based parser, but does not provide trained models. Achieved some top scores at CoNLL 2009 shared task (SRL-only). You can try it out yourself here: http://barbar.cs.lth.se:8081/
~1.5gb RAM
Example output

Authors: Dipanjan Das, Andre Martins, Nathan Schneider, Desai Chen and Noah A. Smith at Carnegie Mellon University.
Latest release: May 2012
trained on FrameNet. Extracts nominal frames as well as verbal.
Resource intensive (~8gb RAM for me on 64bit).
Example output

SENNA

Authors: R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa
Latest release: August 2011
The only completely self-contained library on this list. Very fast and efficient c code. Non-commercial license.
~180mb RAM
Author: Mihai Surdeanu
Latest release: 2007
If you want to compile with gcc > 4.3 you need to add some explicit c headers (much easier than trying to install multiple gccs!). I put a patched version up on github if you're interested (I also made some OSX compatibility changes and hacked on a server mode): github.com/kvh/SwiRL
c++ code, uses AdaBoost and Charniak parser. Fast and efficient.
~150mb RAM
Example output

Shalmaneser

Authors: K. Erk and S. Pado
Latest release: 2007
You'll need to download and install TnT, TreeTagger, Collins parser and mallet to get this running. Uses actual framenet labels (including nominal targets),  and comes with pre-trained classifiers for FrameNet 1.3.
Low memory usage.

Curator

I couldn't get a local install of this working. The web demo works though, so you can give that a go. You can see all their software demos here: http://cogcomp.cs.illinois.edu/curator/demo/

LTH

Authors: Lund University
This work has been subsumed by Mate-tools.
Example output

Conclusions

The java libraries can get memory hungry, so if you are looking for something more lightweight, I would recommend either SwiRL or SENNA. In terms of labeling performance, direct comparisons between most of these libraries is hard due to their varied outputs and objectives. Most perform at or near state-of-the-art, so it's more about what fits your needs.

Let me know if I missed any!

5 comments:

  1. Thanks, this comparison has been helpful.

    ReplyDelete
  2. I hope you will forgive what is probably a stupid question, but I have been having trouble getting Senna to work from the command line -- do you know of any good online tutorials/walk-throughs? Thanks,

    ReplyDelete
  3. A shot in the dark, but maybe you can help me. I tried to compile SwiRL from your updated source code, but it aborted with the complaint "make: *** No rule to make target "Assert.h", needed by "all-am"." Do you have any idea how to fix this? (I'm using Ubuntu 13.10.)

    ReplyDelete
  4. See also: clearnlp: http://clearnlp.wikispaces.com/ which can do SRL

    ReplyDelete