Janneke van der Zwaan
About
Categories
All
(4)
ocr post-correction
(3)
tips and tricks
(1)
Posts
Converting the error detection notebooks to a DVC pipeline
ocr post-correction
Because ocr postcorrection is a hobby project, I don’t have a lot of time to work on it. This means it is even more important to keep track of what I did. And of course, I…
Jun 30, 2023
A basic model and training pipeline for correcting OCR mistakes
ocr post-correction
Of course, just detecting OCR mistakes doesn’t quite cut it. It is the corrections we are interested in! Since a correction model needs to be trained from scratch, I start…
Jan 18, 2023
Storing custom token classification labels in a Hugging Face dataset
tips and tricks
In my previous blog post, I showed how I created a Hugging Face dataset for detecting OCR mistakes. One thing thing that annoyed me about this dataset is that it didn’t…
Dec 18, 2022
Detecting OCR mistakes in text using BERT for token classification
ocr post-correction
Some years ago, I did a project with the Dutch National Library on OCR post-correction. I wanted to investigate the potential of Deep Learning for correcting OCR errors in…
Oct 21, 2022
No matching items