S.Sakthybaalan
S.Sakthybaalan

Reputation: 519

How to automate Google Drive Docs OCR facility?

I have using Google Drive and its Open with Google Docs facility to convert them into OCR word file (.docx). Because the word file preserves the formatting also. I have many images and upload them to Drive and convert them into editable one by one because PDF conversion does not work.

In this time I want to wait patiently to finish one conversion process. After that I start the next conversion, it is time consuming.

I used Google OCR API. But it does not preserve the formatting such as bold, alignment, etc.

So, is there any way to automate this process using REST API?

UPDATE

  1. Uploaded images to the Google Drive link

  2. The Right click context menu of an image in Google Drive link

  3. Google Docs in the context menu of "Open with" link

  4. After the conversion process the OCR(Auto language detected) link

  5. Finally the Google document and the image link

I tried the googleapis on GitHub and I selected the drive sample list.js code.

My Code

'use strict';

const {google} = require('googleapis');
const sampleClient = require('../sampleclient');

const drive = google.drive({
  version: 'v3',
  auth: sampleClient.oAuth2Client,
});

async function runSample(query) {
  const params = {pageSize: 3};
  params.q = query;
  const res = await drive.files.list(params);
  console.log(res.data);
  return res.data;
}

if (module === require.main) {
  const scopes = ['https://www.googleapis.com/auth/drive.metadata.readonly'];
  sampleClient
    .authenticate(scopes)
    .then(runSample)
    .catch(console.error);
}

module.exports = {
  runSample,
  client: sampleClient.oAuth2Client,
};

Upvotes: 4

Views: 2900

Answers (1)

Tanaike
Tanaike

Reputation: 201428

How about this modification?

From your sample script, it was found that you are using googleapis. So in this modification, I also used googleapis. The image files in Drive are converted to Google Document with OCR by files.copy method in Drive API. The following modification supposes the following points.

  1. You are using googleapis in Node.js.
  2. When you run your script, you have already retrieved file list by Drive API.
    • This indicates that drive in your script can be also used for the files.copy method.

Notes :

  • If you have not used Drive API yet, please check the quickstart. (version 3).

Confirmation point:

Before you run the script, please confirm the following points.

  • In order to use the files.copy method, please include https://www.googleapis.com/auth/drive to the scopes in if statement in list.js.

Modified script 1 (to convert Google Docs with OCR by giving files() id:

In this modification, runSample() was modified.

function runSample()
{
    // Please set the file(s) IDs of sample images in Google Drive.
    const files = [
        "### fileId1 ###",
        "### fileId2 ###",
        "### fileId3 ###", , ,
    ];

    // takes each file and convert them to Google Docs format
    files.forEach((id) =>
    {
        const params = {
            fileId: id,
            resource:
            {
                mimeType: 'application/vnd.google-apps.document',
                parents: ['### folderId ###'], // If you want to put the converted files in a specific folder, please use this.
            },
            fields: 'id',
        };

        // Convert after processes here
        // Here we copy the IDs 
        drive.files.copy(params, (err, res) =>
        {
            if (err)
            {
                console.error(err);
                return;
            }
            console.log(res.data.id);
        });
    });
}

Note:

  • Your files(images) are converted to Google Document by above script, and it seems that the result (Google document) is same as your sample (in your question). But I'm not sure whether this is the quality which you want, please apologize.

References:

Modified script 2 (to convert Google Docs with OCR by single folder and selects only images:

  • You want to convert the files(images) to Google Document by retrieving them from a specific folder.
  • You want to retrieve files of image/png, image/jpeg and image/tiff.

Sample code syntax:

const folderId = "### folderId ###"; // Please set the folder ID including the images.
drive.files.list(
{
    pageSize: 1000,
    q: `'${folderId}' in parents and (mimeType='image/png' or mimeType='image/jpeg' or mimeType='image/tiff')`,
    fields: 'files(id)',
}, (err, res) =>
{
    if (err)
    {
        console.error(err);
        return;
    }
    const files = res.data.files;
    files.forEach((file) =>
    {
        console.log(file.id);

        // Please put above script of the files.forEach method by modifying ``id`` to ``file.id``.

    });
});

In this next modification, entire runSample() was modified.

function runSample()
{
    // Put the folder ID including files you want to convert.
    const folderId = "### folderId ###";

    // Retrieve file list.
    drive.files.list(
    {
        pageSize: 1000,
        q: `'${folderId}' in parents and (mimeType='image/png' or mimeType='image/jpeg' or mimeType='image/tiff')`,
        fields: 'files(id)',
    }, (err, res) =>
    {
        if (err)
        {
            console.error(err);
            return;
        }
        const files = res.data.files;

        // Retrieve each file from the retrieved file list.
        files.forEach((file) =>
        {
            const params = {
                fileId: file.id,
                resource:
                {
                    mimeType: 'application/vnd.google-apps.document',
                    parents: ['### folderId ###'],
                },
                fields: 'id',
            };

            // Convert a file
            drive.files.copy(params, (err, res) =>
            {
                if (err)
                {
                    console.error(err);
                    return;
                }
                console.log(res.data.id);
            });
        });
    });
}

References:

Upvotes: 2

Related Questions