Janneke van der Zwaan
  • About
Categories
All (4)
ocr post-correction (3)
tips and tricks (1)

Posts

Pulling a boat, by Kamisaka Sekka (1909)

Converting the error detection notebooks to a DVC pipeline

ocr post-correction
Because ocr postcorrection is a hobby project, I don’t have a lot of time to work on it. This means it is even more important to keep track of what I did. And of course, I…
Jun 30, 2023

A basic model and training pipeline for correcting OCR mistakes

ocr post-correction
Of course, just detecting OCR mistakes doesn’t quite cut it. It is the corrections we are interested in! Since a correction model needs to be trained from scratch, I start…
Jan 18, 2023

Storing custom token classification labels in a Hugging Face dataset

tips and tricks
In my previous blog post, I showed how I created a Hugging Face dataset for detecting OCR mistakes. One thing thing that annoyed me about this dataset is that it didn’t…
Dec 18, 2022

Detecting OCR mistakes in text using BERT for token classification

ocr post-correction
Some years ago, I did a project with the Dutch National Library on OCR post-correction. I wanted to investigate the potential of Deep Learning for correcting OCR errors in…
Oct 21, 2022
No matching items