A. Attia
A. Attia

Reputation: 1720

Get Google Drive files links using Drive File Stream

I am using Drive File Stream to access to my Google Drive files.
So I have the list of all my Google Drive files in Python (using os.path.walk) but I want to know if it's possible to get the file online link.
I am on Mac OS X and using Finder, with the secondary click, I have an option "Open with Google Drive" which directly open the file on my browser. I want to automatically do the same thing with all my files.

Upvotes: 3

Views: 8900

Answers (4)

O.O
O.O

Reputation: 1298

In this answer: https://stackoverflow.com/a/52107704/1855988, the author pointed out the problem of "local-" when trying to get the file id using a program.

Below is my solution to avoid the "local-" issue when try to get the ids using python:

...
for ...
  shareable_link = get_shareable_link(file_path)
  while 'local-' in shareable_link:
    sleep(5) # sleep in case of 'local-'
    shareable_link = get_shareable_link(file_path)
    print('trying to avoid local- issue...')
  sleep(5) # sleep between calls
  ...

# get shareable link for a file in Google Drive
def get_shareable_link(file_path, return_URL=False):
  fid = getoutput("xattr -p 'user.drive.id' " + "'" + file_path + "'")
  # for testing
  if return_URL:
    return HTML(f"<a href=https://drive.google.com/file/d/{fid} target=_blank>Click Me!</a>")
  else:
    return "https://docs.google.com/uc?export=download&id=" + fid

Upvotes: 1

debugme
debugme

Reputation: 1081

I like @lahma's answer but it doesn't work that way for me on OSX/MacOS

Google File Stream creates a file system backed by a sqlite database. To get all the document id's for your link, you can query the sqlite database. Note it's only worth going down this route if you have a lot of files to go through or you're trying to automate something.

Looking at Google's FAQ (https://support.google.com/a/answer/2490100?hl=en), Google File Stream uses the following location for its local cache (by default):

  • Windows: %LOCALAPPDATA%\Google\DriveFS
  • Mac: ~/Library/Application Support/Google/DriveFS

In that directory is a 20 char string that's probably your id (for example: ZacGF23N0WRAbuY2TI9t).

Within there is metadata_sqlite_db

You can open it via sqlite

sqlite metadata_sqlite_db

Everything in google drive is in the items table. The fields that you probably care about are:

  • stable_id - what you'll use for joins locally
  • local_title - name you see on your file system
  • id - what you need to generate your link

You will also need the stable_parents table. This table links all objects to their parents. So every file or folder has a parent folder except My Drive which is your root.

This query will find a file by name (20190312.pdf) and show you what folder it's in:

SELECT 
  i.stable_id, i.id, i.local_title, 
  'parent', i2.stable_id, i2.id, i2.local_title
FROM items i
  JOIN stable_parents sp ON i.stable_id = sp.item_stable_id
  JOIN items i2 ON sp.parent_stable_id = i2.stable_id
WHERE i.local_title = '20190312.pdf';

This query would give you all documents in a folder called 'FAQ':

SELECT i.stable_id, i.id, i.local_title
FROM items i
       LEFT JOIN stable_parents sp ON i.stable_id = sp.item_stable_id
       LEFT JOIN items i2 ON sp.parent_stable_id = i2.stable_id
WHERE i2.local_title = 'FAQ' and i2.is_folder = 1;

Upvotes: 3

lahma
lahma

Reputation: 91

I recently discovered a much more efficient way of retrieving the Drive API 'File ID' of any given file/folder located within your Google Drive File Stream file system. Although I found this functionality on a Windows system, there may be similar functionality on Mac OS X.

I do not know exactly what mechanism allows for the functionality described below, but it is nearly identical in function to the way NTFS alternate data streams work. The thing is, the properties of Drive File Stream's virtual drive show it as being partitioned as FAT32 which do not support alternate data streams. Perhaps Google's file system driver is simply emulating the behavior of NTFS alternate data streams. Regardless, the Drive File Stream driver adds a mechanism for reading special metadata attached to file/folder's located within its file system. This metadata can be accessed by calling 'ReadFile' on any file/folder path that is suffixed with a colon followed by a special identifier describing the metadata one wants to retrieve. These are the identifiers I've discovered so far (including descriptions for some of them):

  • user.drive.id - Drive API 'File ID' of the file/folder
  • user.drive.team_drive_id - Drive API 'Team Drive ID' of the file/folder (empty if not in a Team Drive)
  • user.drive.gdoc - Returns '1' if the file is a 'Google Docs' document or '0' if it is not
  • user.drive.email - As far as I can tell, this simply returns the email address of the Google account that Drive File Stream is signed into (I tried calling this on a file that is owned by another Google account, but it still returned my email address)
  • user.drive.itemprotostr - Provides nearly all of the metadata attached to the file, including the output of nearly all of these other identifiers
  • user.drive.md5 - The MD5 checksum of the file
  • com.apple.FinderInfo - Not sure if this actually a valid identifier (as it does not work on Windows), but since I found it in close proximity to the others, and it may be relevant to your platform, I thought I would provide it just in case
  • user.drive.stableid
  • user.drive.pinned
  • user.drive.uncommitted
  • user.drive.progress
  • user.drive.impression
  • user.drive.is_virtual_folder
  • user.drive.thumbnail

Windows

On Windows, an easy way to read this metadata from the command line is by using the 'type' command. The following example would give you the Drive API file ID of the file 'test.txt' residing in the root directory of your Google Drive:

type "G:\My Drive\test.txt:user.drive.id"

Note: If you are using this in some type of script that is creating new files/folders and quickly reading the 'user.drive.id' afterwards, be aware that it can take many seconds for the "real" file id to be generated. If you read the value of 'user.drive.id' and it starts with 'local', this means that it has not yet generated an actual file id. In my opinion, the best way to deal with this is to create an asynchronous loop that sleeps between checks, and then returns the file id once it no longer starts with 'local'.

I do not personally use Mac, so I'm definitely not the most qualified individual to offer advice on its use. With that being said, I do have a bit of experience in Linux, and I believe OS X uses a lot of the same commands since it is Unix based. If the Windows functionality directly correlates to OS X (which I am not at all sure of), I believe the following command should achieve the same thing as the Windows command provided above:

cat "/Volumes/GoogleDrive/My Drive/test.txt:user.drive.id"

I hope this info is useful to you or anyone else who might come across this. I know it was very valuable for my Python app, as it allowed me to use data that was already cached by Drive File Stream rather than having to make a separate API call for each individual file (which in my case was 10's of thousands). Good luck with your project!

Upvotes: 9

stewartmcgown
stewartmcgown

Reputation: 526

If you've got the Google Drive Python API installed, then this shouldn't be too hard.

The simplest approach would be to run a query for files with that name.

query = "name contains %s" % fileName
results = drive_service.files().list(q=query, fields="name, id, parents, webViewLink").execute()
files = results.get('files', [])
if len(files) == 1:
   return files[0]['webViewLink']

This will return either one result (hooray! that's your file), or multiple results. If its the latter, you will need to check if the file you've picked is the right one or not. This could be done by searching to see if the parents of any of the returned files match the name of the file path.

Ie, if your path is G:\My Drive\Documents\Clowns.mp4, and you have multiple results for the first search of Clowns.mp4, then you could loop through them like so:

for current_file in files:
    parent_id = current_file['parents'][0]
    parent_folder = drive_service.files().get(fileId=parent_id, fields="name").execute()
    if parent_folder['name'] == "Documents":
         print("%s is the right file")
         return current_file['webViewLink']

Upvotes: 0

Related Questions