本文共 2360 字,大约阅读时间需要 7 分钟。
Tesseract是一个开源的OCR引擎,支持多国语言,其官方地址:
文档地址:
1.MAC下安装Tesseract
命令安装brew install --with-training-tools tesseract,现在提示 Error: invalid option: --with-training-tools,没有--with-training-tools参数,想把训练工具training-tools一起安装了,最后采用编译的方式安装
# Packages which are always needed.brew install automake autoconf libtoolbrew install pkgconfigbrew install icu4cbrew install leptonica# Packages required for training tools.brew install pango# Optional packages for extra features.brew install libarchive# Optional package for builds using g++.brew install gcc git clone https://github.com/tesseract-ocr/tesseract/cd tesseract./autogen.shmkdir buildcd build# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfigmake -j# Optionally install Tesseract.sudo make install# Optionally build and install training tools.make trainingsudo make training-install
之后下载语言包
下载.traineddata文件 并且拷贝到tessdata文件夹下。
具体语言包地址:
都执行完后,可以控制台执行命令看一下识别的结果:tesseract 111.jpg stdout
安装参考文章:
2.Java语言识别,tess4j开发OCR识别
引入tess4j的maven依赖
net.sourceforge.tess4j tess4j 4.5.1
执行识别demo代码:
public class Tess4jOcrTest { public static void main(String[] args) { String bath = "/Users/seapeak/Desktop/"; test1(bath + "555.jpg"); } /** * 根据路径识别文字结果 * @param path */ public static void test1(String path) { File file = new File(path); ITesseract it = new Tesseract(); // 如果没有改变tessdata目录位置请输入.// it.setDatapath(".");// // 如果变更过tessdata目录请指定位置 it.setDatapath("/Users/seapeak/Desktop/it/java/tesseract/tessdata/"); //如果是汉字居多设置语言,如果字符偏多设置eng it.setLanguage("chi_sim"); try { String result = it.doOCR(file); log.info("识别结果:"+result ); } catch (TesseractException e) { // TODO Auto-generated catch block e.printStackTrace(); log.error("Tess4jOcrTest TesseractException:{}",e); } }}
执行时如果如下未找到language的错误,则设置setDatapath的tessdata目录
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng' 参考文档:
3.training-tools 训练工具 的使用,待续
可以参考:
转载地址:http://txadi.baihongyu.com/