Dutch Humor Detection by Generating Negative Examples
- Speaker: Thomas Winters
- Type: Conference talk
- Date: 2020-11-20
- Location: BeneLearn2020: 29th Belgian Dutch Conference on Machine Learning
Detecting if a text is humorous is a hard task to do computationally, as it usually requires linguistic and common sense insights. In machine learning, humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text. Rather than using completely different non-humorous texts, we propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm. We constructed several different joke and non-joke datasets to test the humor detection abilities of different language technologies. In particular, we compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT. In doing so, we create and compare the first Dutch humor detection systems. We found that while other language models perform well when the non-jokes came from completely different domains, RobBERT was the only one that was able to distinguish jokes from generated negative examples. This performance illustrates the usefulness of using text generation to create negative datasets for humor recognition, and also shows that transformer models are a large step forward in humor detection.
Video
Slides
Related paper
What people said
Can we improve humor detection algorithms by automatically breaking real jokes? Looks like it! @pieterdelobelle @BNAIC_conf #benelearn
— Thomas Winters (@thomas_wint) November 19, 2020
📝 Paper: https://t.co/qQKkgt3W0i
🌐 Blog: https://t.co/BQHXyV2QxW
🤖 Code: https://t.co/kWi1w3jOF4 pic.twitter.com/AZ3lsB1FRB
Oh wow! Thanks @BNAIC_conf for honoring the presentation of this work with the "Best Video Award"! 😍#benelearn
— Thomas Winters (@thomas_wint) November 20, 2020
The presentation is also still available on the conference's YouTube channel: https://t.co/ksGdXlvWgg
Related projects
RobBERT Humor Detection
Distinguishing jokes from generated non-jokes, creating the first Dutch humor detectors