Hitesh Somani
Hitesh Somani

Reputation: 980

Remove downloaded tensorflow and pytorch(Hugging face) models

I would like to remove tensorflow and hugging face models from my laptop. I did find one link https://github.com/huggingface/transformers/issues/861 but is there not command that can remove them because as mentioned in the link manually deleting can cause problems because we don't know which other files are linked to those models or are expecting some model to be present in that location or simply it may cause some error.

Upvotes: 29

Views: 35675

Answers (5)

sifat
sifat

Reputation: 442

You can run this code to delete all models

from transformers import TRANSFORMERS_CACHE
print(TRANSFORMERS_CACHE)

import shutil
shutil.rmtree(TRANSFORMERS_CACHE)

Upvotes: 9

Stephen Chen
Stephen Chen

Reputation: 385

From a comment in transformers github issue, you can use the following way to find the cache directory so that you can clean it:

from transformers import file_utils
print(file_utils.default_cache_path)

Upvotes: 13

rivu
rivu

Reputation: 2504

Use

pip install huggingface_hub["cli"]

Then

huggingface-cli delete-cache

You should now see a list of revisions that you can select/deselect.

See this link for details.

Upvotes: 46

cronoik
cronoik

Reputation: 19320

The transformers library will store the downloaded files in your cache. As far as I know, there is no built-in method to remove certain models from the cache. But you can code something by yourself. The files are stored with a cryptical name alongside two additional files that have .json (.h5.json in case of Tensorflow models) and .lock appended to the cryptical name. The json file contains some metadata that can be used to identify the file. The following is an example of such a file:

{"url": "https://cdn.huggingface.co/roberta-base-pytorch_model.bin", "etag": "\"8a60a65d5096de71f572516af7f5a0c4-30\""}

We can now use this information to create a list of your cached files as shown below:

import glob
import json
import re
from collections import OrderedDict 
from transformers import TRANSFORMERS_CACHE
 
metaFiles = glob.glob(TRANSFORMERS_CACHE + '/*.json')
modelRegex = "huggingface\.co\/(.*)(pytorch_model\.bin$|resolve\/main\/tf_model\.h5$)"

cachedModels = {}
cachedTokenizers = {}
for file in metaFiles:
     with open(file) as j:
         data = json.load(j)
         isM = re.search(modelRegex, data['url'])
         if isM:
             cachedModels[isM.group(1)[:-1]] = file
         else:
             cachedTokenizers[data['url'].partition('huggingface.co/')[2]] = file

cachedTokenizers = OrderedDict(sorted(cachedTokenizers.items(), key=lambda k: k[0]))

Now all you have to do is to check the keys of cachedModels and cachedTokenizers and decide if you want to keep them or not. In case you want to delete them, just check for the value of the dictionary and delete the file from the cache. Don't forget to also delete the corresponding *.json and *.lock files.

Upvotes: 10

ML85
ML85

Reputation: 715

pip uninstall tensorflow 
pip uninstall tensorflow-gpu
pip uninstall transformers

and find where you have saved gpt-2

model.save_pretrained("./english-gpt2") .???

english-gpt2 = your downloaded model name.

from that path you can manually delete.

Upvotes: -4

Related Questions