RobBERT

RobBERT is an artificial intelligence model which has examined hundreds of millions of Dutch sentences.

During this training, it gained all kinds of insight into how the Dutch language works. You can show RobBERT some new Dutch sentences and make it perform a particular task. For example, by giving it a dataset of positive and negative book reviews, it can then predict with 94% accuracy whether an unseen book review is positive or negative. By giving it sentences where the Dutch word "die" or "dat" is missing, it can correctly predict these for more than 98% of the cases.

It is however not just limited to these types of tasks, but can be used for many different language-based tasks, such as comparing sentences, labeling words in sentences and classifying texts.

Technical explanation

RobBERT is a RoBERTa-based Dutch language model that achieves state-of-the-art results on many different Dutch language tasks. This transformer model was trained on the Dutch portion of the OSCAR dataset using Facebook AI's RoBERTa framework, which is an improved version of Google's BERT model.

RobBERT outperformed previous systems on several language tasks, such as sentiment analysis of Dutch Book reviews, and word token prediction.

Performance of RobBERT on different language tasks
Performance of RobBERT on different language tasks

How to use

Using this model in your Python code is ridiculously easy! Just add the 🤗 transformers repository as a dependency, and load the model using the following lines of code:

from transformers import RobertaTokenizer, RobertaForSequenceClassification
tokenizer = RobertaTokenizer.from_pretrained("pdelobelle/robBERT-base")
model = RobertaForSequenceClassification.from_pretrained("pdelobelle/robBERT-base")

You can now use any of the RoBERTa classes to use and fine-tune RobBERT on your data.

Paper & blog post

For more information, you can also read our paper on ArXiv. Pieter Delobelle has also thoroughly described the internals of the RobBERT model in an excellent blog post on his website.

View RobBERT

Metadata

Related publication

Media mention

Back to projects