Reputation: 8009
I need to concatenate---into a single frame---every produkt_monat_Monatswerte_18910101_20110331_00003.txt
file in each of the zip files from this ftp site.
This is the code that I am using so far:
import pandas as pd
from pandas.io.parsers import *
import glob
import requests
from zipfile import ZipFile
import urllib.request as ur
years = 'produkt_monat_Monatswerte_*.txt'
names = pd.DataFrame()
for year in years:
path ="ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/monthly/kl/historical/monatswerte_?????_????????_????????_hist.zip").read()
frame = pd.read_csv(path, names=columns)
frame['year'] = year
names = names.concat(frame, ignore_index=True)
and it is giving me the following error:
File "<ipython-input-25-d57a1d77ecc6>", line 5
path ="ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/monthly/kl/historical/monatswerte_?????_????????_????????_hist.zip")
Upvotes: 0
Views: 281
Reputation: 842
The problem is you can't have pandas extract an inner file from the Zip. Try the following code:
import pandas as pd
from ftplib import FTP
import os
from zipfile import ZipFile
from io import BytesIO
f_root = 'ftp-cdc.dwd.de'
zips_path = '/pub/CDC/observations_germany/climate/monthly/kl/historical/'
ftp = FTP(f_root)
ftp.login()
ftp.cwd(zips_path)
paths = [p[0] for p in ftp.mlsd('.') if p[0].endswith('.zip')]
dfs = []
for path in paths:
buf = BytesIO()
ftp.retrbinary("RETR " + path, lambda block: buf.write(block))
z = ZipFile(buf)
zi = list(filter(lambda x: x.filename.startswith('produkt'), z.filelist))[0]
df = pd.read_csv(BytesIO(z.read(zi.filename)), sep=';', encoding="cp1252")
dfs.append(df)
final = pd.concat(dfs)
Upvotes: 1