Reputation: 289

Download whole folder from SourceForge

I need to download a project from SourceForge, but there is no easy visible way. Here, on this picture (linked down, not enough reputation), it is possible to download "the latest version", which does include only files from first folder, but I need to download other folder.

It is possible to download these files, but only manually and because there are hundreds of files and subfolders - it would be quite impractical.

Does anyone know any way to download it? I didn't find much, only some mentioned wget, but I tried it without any success.

Link: http://s9.postimg.org/xk2upvbwv/example.jpg

Upvotes: 28

Answers (9)

clonejo

Reputation: 1259

Based on Aadharsh Aadhithya's answer, but can recursively download whole folder structures:

#!/usr/bin/env python3

import random
from pathlib import Path

import requests
from bs4 import BeautifulSoup


def download_files_from_sourceforge(sf_url, download_dir):
    res = requests.get(sf_url)
    res.raise_for_status()
    soup = BeautifulSoup(res.content, "html.parser")

    files = [file.a["href"] for file in soup.find_all("th", headers="files_name_h")]

    print("sf_url =", sf_url)
    random.shuffle(files)
    print("  files =", files)
    for file_download_url in files:
        print("  file_download_url =", file_download_url)

        if file_download_url.endswith("/"):
            # is folder
            folder_name = (
                file_download_url.rstrip("/")
                .rsplit("/", maxsplit=1)[-1]
                .replace("%20", " ")
            )
            print("    folder_name =", folder_name)
            out_path = Path(download_dir, folder_name)
            out_path.mkdir(exist_ok=True)
            download_files_from_sourceforge(
                "https://sourceforge.net" + file_download_url, out_path
            )

        else:
            # is file
            filename = file_download_url.split("/")[-2].replace("%20", " ")
            print("    filename =", filename)
            out_path = Path(download_dir, filename)

            if out_path.exists():
                print("    exists, skipping")
                continue
            res = requests.get(file_download_url)
            res.raise_for_status()

            with out_path.open("wb") as f:
                f.write(res.content)
                print(f"    created file {out_path}")


download_files_from_sourceforge(
    "https://sourceforge.net/projects/milestone.motorola/files/", Path(".")
)

Upvotes: 0

kronosta

Reputation: 23

Here's a modification of milahu's bash script I made for extracting multiple files. The extra bit at the end is to convert all of the [file path]/download files into just [file path].

Replace URL at the top with the link to the files, likely just replacing "ord" with something else.

#! /usr/bin/env bash

# license: "public domain" or "MIT license"

URL="https://sourceforge.net/projects/ord/files/"

debug=false
#debug=true

function sourceforge_get_file_urls() {

  # return one folder per line, multiple file urls per line

  local folder_url="$1"

  local html="$(curl -s "$folder_url")"

  local links="$(echo "$html" | grep '^<th scope="row" headers="files_name_h"><a href' | cut -d'"' -f6 | sed 's|^/|https://sourceforge.net/|')"

  $debug && echo "sourceforge_get_file_urls: links:" >&2
  $debug && echo "$links" | sed 's/^/  /' >&2

  # loop files
  # convert file urls to "proper basename" urls = locally resolve the first http redirect
  # example:
  # a: https://          sourceforge.net/projects/sevenzip/files/7-Zip/21.07/7z2107-src.tar.xz/download
  # b: https://downloads.sourceforge.net/project /sevenzip      /7-Zip/21.07/7z2107-src.tar.xz
  $debug && echo "sourceforge_get_file_urls: looping files" >&2
  file_links=$(echo "$links" | grep '/download$')
  if [ -n "$file_links" ]; then
    echo "$file_links" |
    sed -E 's|^https://sourceforge.net/projects/([^/]+)/files/([^/]+)/(.*?)/$|https://downloads.sourceforge.net/project/\1/\2/\3|' |
    xargs echo -n
    echo
  fi

  # loop folders
  $debug && echo "sourceforge_get_file_urls: looping folders" >&2
  while read next_folder_url; do
    # recurse
    $debug && echo "sourceforge_get_file_urls: recurse from $folder_url to $next_folder_url" >&2
    sourceforge_get_file_urls "$next_folder_url"
  done < <(echo "$links" | grep '/$')
}



# generic: get url from args, download all files

if [ -n "$1" ]; then
  exec sourceforge_get_file_urls "$1" |
  xargs -n1 wget --no-clobber --recursive --level=1
fi



# example: get only the 7z source tarballs from the sevenzip project

echo example: sourceforge_get_file_urls $URL
sourceforge_get_file_urls $URL |
while read folder_urls; do
  folder_urls=$(echo "$folder_urls" | tr ' ' $'\n')
  $debug && echo -e "folder_urls:\n$folder_urls"
  # pick only one file per folder
  file_url="$folder_urls"
  #file_url=$(echo "$file_url" | grep -E '/7z[0-9]+(-src\.tar\.xz|-src\.7z|\.tar\.bz2)$')
  file_url=$(echo "$file_url")
  #if [[ "$(echo "$file_url" | wc -l)" != 1 ]]; then
    # we still have multiple file urls
    # remove *.7z
    # example file urls:
    # https://downloads.sourceforge.net/project/sevenzip/7-Zip/22.01/7z2201-src.tar.xz
    # https://downloads.sourceforge.net/project/sevenzip/7-Zip/22.01/7z2201-src.7z
    #file_url=$(echo "$file_url" | grep -E '/7z[0-9]+(-src\.tar\.xz|\.tar\.bz2)$')
  #fi
  #if [[ "$(echo "$file_url" | wc -l)" != 1 ]]; then
    # we still have multiple file urls
    echo "FIXME filter file urls:" >&2
    echo "$file_url" | sed 's/^/  /' >&2
  #fi
  #if [[ "$(echo "$file_url" | wc -l)" == 1 ]]; then
  #  $debug && echo "ok: $file_url" >&2
  #fi
  for i in $file_url; do
    echo "$i"
  done;
done | sed 's/^/wget --no-clobber --recursive --level=1 /;' | sh

# note: "wget --no-clobber" assumes that existing files are valid

# list all downloaded files, sort by version
echo "downloaded files:"
find downloads.sourceforge.net/ -type f | sort -V | sed 's/^/  /'

cd $(echo "$URL" | sed 's/^https\{0,1\}:\/\///')
files=$(find)
for i in $files; do
  if [ -f "$(echo "$i")" ]; then
    fixi="$(echo "$i" | sed 's/\/download$//')"
    mv "$fixi/download" "${fixi}_";
    rm -rf "${fixi}/";
    mv "${fixi}_" "$fixi"
  fi
done

Upvotes: 0

milahu

Reputation: 3599

some bash code to implement a recursive file downloader for sourceforge

im using wget --no-clobber to skip existing files, to make the downloader faster. this could be optimized further by skipping existing folders

the example code block will

get a file list from https://sourceforge.net/projects/sevenzip/files/7-Zip/
filter the file list, so there is only one file per folder
download the filtered file list to downloads.sourceforge.net/project/sevenzip/7-Zip/

when a sourceforge folder url is passed to the script, it will download all files

#! /usr/bin/env bash

# license: "public domain" or "MIT license"

debug=false
#debug=true

function sourceforge_get_file_urls() {

  # return one folder per line, multiple file urls per line

  local folder_url="$1"

  local html="$(curl -s "$folder_url")"

  local links="$(echo "$html" | grep '^<th scope="row" headers="files_name_h"><a href' | cut -d'"' -f6 | sed 's|^/|https://sourceforge.net/|')"

  $debug && echo "sourceforge_get_file_urls: links:" >&2
  $debug && echo "$links" | sed 's/^/  /' >&2

  # loop files
  # convert file urls to "proper basename" urls = locally resolve the first http redirect
  # example:
  # a: https://          sourceforge.net/projects/sevenzip/files/7-Zip/21.07/7z2107-src.tar.xz/download
  # b: https://downloads.sourceforge.net/project /sevenzip      /7-Zip/21.07/7z2107-src.tar.xz
  $debug && echo "sourceforge_get_file_urls: looping files" >&2
  file_links=$(echo "$links" | grep '/download$')
  if [ -n "$file_links" ]; then
    echo "$file_links" |
    sed -E 's|^https://sourceforge.net/projects/([^/]+)/files/([^/]+)/(.*?)/download$|https://downloads.sourceforge.net/project/\1/\2/\3|' |
    xargs echo -n
    echo
  fi

  # loop folders
  $debug && echo "sourceforge_get_file_urls: looping folders" >&2
  while read next_folder_url; do
    # recurse
    $debug && echo "sourceforge_get_file_urls: recurse from $folder_url to $next_folder_url" >&2
    sourceforge_get_file_urls "$next_folder_url"
  done < <(echo "$links" | grep '/$')
}



# generic: get url from args, download all files

if [ -n "$1" ]; then
  exec sourceforge_get_file_urls "$1" |
  xargs -n1 wget --no-clobber --recursive --level=1
fi



# example: get only the 7z source tarballs from the sevenzip project

echo example: sourceforge_get_file_urls https://sourceforge.net/projects/sevenzip/files/7-Zip/ 7zip-files
sourceforge_get_file_urls https://sourceforge.net/projects/sevenzip/files/7-Zip/ 7zip-files |
while read folder_urls; do
  folder_urls=$(echo "$folder_urls" | tr ' ' $'\n')
  $debug && echo -e "folder_urls:\n$folder_urls"
  # pick only one file per folder
  file_url="$folder_urls"
  file_url=$(echo "$file_url" | grep -E '/7z[0-9]+(-src\.tar\.xz|-src\.7z|\.tar\.bz2)$')
  if [[ "$(echo "$file_url" | wc -l)" != 1 ]]; then
    # we still have multiple file urls
    # remove *.7z
    # example file urls:
    # https://downloads.sourceforge.net/project/sevenzip/7-Zip/22.01/7z2201-src.tar.xz
    # https://downloads.sourceforge.net/project/sevenzip/7-Zip/22.01/7z2201-src.7z
    file_url=$(echo "$file_url" | grep -E '/7z[0-9]+(-src\.tar\.xz|\.tar\.bz2)$')
  fi
  if [[ "$(echo "$file_url" | wc -l)" != 1 ]]; then
    # we still have multiple file urls
    echo "FIXME filter file urls:" >&2
    echo "$file_url" | sed 's/^/  /' >&2
  fi
  if [[ "$(echo "$file_url" | wc -l)" == 1 ]]; then
    $debug && echo "ok: $file_url" >&2
  fi
  echo "$file_url"
done |
xargs -n1 wget --no-clobber --recursive --level=1

# note: "wget --no-clobber" assumes that existing files are valid

# list all downloaded files, sort by version
echo "downloaded files:"
find downloads.sourceforge.net/ -type f | sort -V | sed 's/^/  /'

Upvotes: 0

Aadharsh Aadhithya

Reputation: 9

Here is a sample Python script you could use to download from SourceForge:

import os
import requests

from bs4 import BeautifulSoup

def download_files_from_sourceforge(sf_url, download_dir):

    r = requests.get(sf_url) 
    soup = BeautifulSoup(r.content, 'html.parser')
    
    files = [file.a['href'] for file in soup.find_all('th', headers='files_name_h')]
        
    for file_download_url in files:
        filename = file_download_url.split('/')[-2]

        # Skip files that already exist
        if filename not in os.listdir(download_dir):
            r = requests.get(file_download_url)
    
            with open(os.path.join(download_dir, filename), 'wb') as f:
                f.write(r.content)
                print(f"created file {os.path.join(download_dir, filename)}")


download_files_from_sourceforge('https://sourceforge.net/Files/filepath',
                                'your/download/directory/here')

Upvotes: 0

Abdelali Laaraje

Reputation: 377

In every Sourceforge project or project folders page there is an RSS link, as you can see in the example screenshot here.

right click that RSS icon in the page of the folder or project you want to download then copy the link and use the following Bash script:

curl "<URL>" | grep "<link>.*</link>" | sed 's|<link>||;s|</link>||' | while read url; do url=`echo $url | sed 's|/download$||'`; wget $url ; done

replace "<URL>" with your RSS link for example : "https://sourceforge.net/projects/xdxf/rss?path=/dicts-babylon/001", and watch the magic happens, The RSS link will include all the files of the Sourceforge folder or project and it's sub-folders, so the script will download everything recursively.

If the above script doesn't work, try this one that extracts the links from the HTML directly, Replace "<URL>" with the project's files URL example : "https://sourceforge.net/projects/synwrite-addons/files/Lexers/"

curl "<URL>" | tr '"' "\n" | grep "sourceforge.net/projects/.*/download"  | sort  | uniq | while read url; do url=`echo $url | sed 's|/download$||'`; wget $url ; done

Good luck

Upvotes: 23

Max Kleiner

Reputation: 1612

In case of no wget or shell install do it with FileZilla: sftp://[email protected] you open the connection with sftp and your password then you browse to the /home/pfs/

after that path (could be a ? mark sign) you fill in with your folder path you want to download in remote site, in my case /home/pfs/project/maxbox/Examples/

this is the access path of the frs: File Release System: /home/frs/project/PROJECTNAME/

Upvotes: 0

V Salai Selvam

Reputation: 37

Example for the above: Suppose that I want to download all the files from the Soruceforge folder: https://sourceforge.net/projects/octave/files/Octave%20Forge%20Packages/Individual%20Package%20Releases/, then the following lines of commands will do this.

wget -w 1 -np -m -A download https://sourceforge.net/projects/octave/files/Octave%20Forge%20Packages/Individual%20Package%20Releases/
grep -Rh refresh sourceforge.net | grep -o "https[^\\?]*" > urllist
wget -P OctaveForgePackages -i urllist

The Sourceforge folder mentioned above contains a lot of Octave packages as .tar.gz files. All those files would be downloaded to the folder 'OctaveForgePackages' locally!

Upvotes: 0

V Salai Selvam

Reputation: 37

How to download all files from a particular folder in Sourceforge on Windows 10:

Step 1: Download the latest wget (zip file, not exe file) from https://eternallybored.org/misc/wget/.

Note: Search google for 'wget 1.20.x' to find proper link, if necessary. Download the 32-bit file, if your system is Winodws 10 32-bit or the 64-bit file, if your system is Windows 10 64-bit.

Step 2: Download the latest grep and coreutils installers from http://gnuwin32.sourceforge.net/packages.html.

Note: Search google for 'gnuwin32' to find proper link, if necessary. Only the 32-bit installers are available.

Step 3: Extract everything in the wget zip file downloaded to C:\WgetWinx32 or C:\WgetWinx64.

Note: You can install the wget virtually anywhere but preferably in a folder without space in the folder name.

Step 4: Install grep by double-clicking the respective installer to a folder, C:\GnuWin32.

Step 5: Install coreutils by double-clicking the respective installer to the same folder where the grep has been installed.

Note: You can install the grep and the coreutils in any order you want (i.e., first grep and then coreutils or vice versa) and virtually anywhere, even in the default location shown by the installer, but preferably in a folder without space in the folder name.

Step 6: Right-click on the 'This PC' icon in the desktop. Select the 'properties' menu from the drop-down list. Select the 'Advanced system settings' from the pop-up 'System' window. Select the 'Environment Variables...' from the pop-up 'System properties' window. Select 'Path' from the 'System variables' and click on the 'Edit...' button. Click on the 'New' button in the 'Edit environment variables' pop-up window. Enter the path for the wget installation folder (e.g., 'C:\WgetWin32' or 'C:\WgetWin64' without quotes). Click on the 'New' button in the 'Edit environment variables' pop-up window again. Enter the path for the grep and coreutils installation folder (e.g., 'C:\GnuWin32\bin' without quotes). Now, keep clicking on the 'Ok' buttons in the 'Edit environment variables', 'System variables' and 'System properties' pop-up windows.

Step 7: Create a DOS batch file, 'wgetcmd.bat' in the wget installation folder (e.g., 'C:\WgetWin32' or 'C:\WgetWin64' without quotes) with the following line using a text editor.

    cd C:\WgetWin32
    cmd
    (OR)
    cd C:\WgetWin64
    cmd

Step 8: Create a shortcut to this batch file on the desktop.

Step 9: Right-click on the shortcut and select the 'Run as administrator' from the drop-down list.

Step 10: Enter the following commands either one by one or all at once in the DOS prompt in the pop-up DOS command window.

    wget -w 1 -np -m -A download <link_to_sourceforge_folder>
    grep -Rh refresh sourceforge.net | grep -o "https[^\\?]*" > urllist
    wget -P <folder_where_you_want_files_to_be_downloaded> -i urllist

That's all folks! This will download all the files from the Sourceforge folder specified.

Upvotes: 2

Mehdi Nellen

Reputation: 9004

Sometimes there is a download link at the summary tab, but sometimes I don't know a work around so I use this piece of code:

var urls = document.getElementsByClassName('name')
var txt = ""
for (i = 0; i < urls.length; i++) {
  txt += "wget " + urls[i].href +"\n"
}
alert(txt)

You should open a console in your browser on the page where all the files are listed. Copy+past+enter the code and you will be prompted a list of wget commands which you can Copy+past+enter in your terminal.

Upvotes: 11

Download whole folder from SourceForge

Answers (9)

How to download all files from a particular folder in Sourceforge on Windows 10:

Related Questions