Skip to main content

Getting Started With the Google Vision API Using a Simple OCR Python Script

·770 words·4 mins

This is how I got started with the Google Vision API. This is meant to be a basic guide on how to get started with this API. It is mainly for my own future use. I used this API to create a very simple Python script to create searchable text documents from images of handwritten notebooks.

Clone Repo
#

The first step is to clone this Python script I created. This script will take a series of photos or scanned images of text and use the Google Vision API to generate a text document containing all the recognized text. This script is mainly just a wrapper around the API to iterate over images of text. The Google Vision API is doing all the heavy lifting.

https://github.com/codygula/Simple-OCR-Python-Script

In addition to the google.cloud library, this also uses the os and re libraries, which are both included with Python. Google.cloud is the only additional thing that needs to be installed.

Install the google.cloud module
#

This site will provide the starting point for installing this module:

https://cloud.google.com/vision/docs/setup

From here, click on “Go to project selector.”

Then click “CREATE PROJECT.”

Give the project a name. I keep the location setting on the default “no organization.”

Enable Billing
#

The next step is to enable billing. A credit card is required, but Google will give you $300 in free trial credits over a 90 day period, or at least it did for me at the time of writing this.

Enable the API
#

Next we need to enable to API. Click the “Enable the API” button which will open up a new tab.

In the “Confirm project” tab, Click “NEXT.”

It should say “You are about to enable ‘Cloud Vision API’.” Click on “ENABLE.” This may take a second or two.

Create a virtual environment
#

Before installing any libraries, it is a good practice in Python to create a virtual environment. If virtualenv is not already installed, install it with this command:

pip install virtualenv

Then crete a virtual environment with these commands, replacing [name] with a name for the virtual environment:

virtualenv [name]
source [name]/bin/activate

Install the google.cloud python library
#

After creating a virtual environment, install the Google Cloud Vision API library with this command:

pip install --upgrade google-cloud-vision

Install Google Cloud CLI
#

Now we need to install the Google Cloud CLI. I am using the Debian/Ubuntu installation instructions located here:

https://cloud.google.com/sdk/docs/install#deb

The specific commands can vary depending on the Linux distro.

First we update the package manager.

sudo apt-get update

Then we install Google Cloud CLI dependencies.

sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo

Then we import the Google Cloud public key.

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg

Then run this command to add the Google Cloud CLI distribution URI as a package source.

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

Finally, install the Google Cloud API with this command. This may take several minutes.

sudo apt-get update && sudo apt-get install google-cloud-cli

Initialize Google Cloud CLI
#

Now that the CLI is installed, we initialize it by running this command:

gcloud init

It will say “You need to login to continue.” Hit “Y” for yes, and it should open up a web browser allowing you to log in with your google account. It will then say “Google Cloud SDK wants access to your Google Account.” Click “Allow.”

Back in the CLI, it should say “You are logged in as: [Google_Account_Name].” It should also prompt you to select the cloud project to use. Enter the number that corresponds to the project that was just created. The script should now be ready to run.

Run the Python Script
#

There are several variables that can be changed in this script.

“language” specifies the language to be used. It is set to English by default. Not setting this will cause the API to auto-detect the language, which can cause outputs containing random Cyrillic and Greek words. This is especially true with handwriting, in my experience.

“Directory” specifies the directory containing the images to be processed. This is required or the script will fail.

“DocumentName” specifies the name to be used in the output text document. This is only used in labelling text in the generating the output file.

“page” specifies the page number to start on when labelling the output text.

“output” specifies the name of the generated text file.

This is a very simple script for converting images of text into a text document with the Google Vision API. I made it for my own use in converting handwritten notebooks to searchable text files.