Reputation: 15

How to access CSV output from Docker Container?

Python Program generates a CSV output. Currently, able to run test.py on host machine and sample_output.csv is generated.

However when implementing the program through Docker Containers faced difficulty in locating the sample_output.csv file. Below are the Dockerfile and requirements.txt files.

numpy==1.19.4
pandas==1.2.0
python-dateutil==2.8.1
pytz==2020.5
scipy==1.5.4
six==1.15.0     // -> requirements.txt



FROM python:3

WORKDIR /demo

COPY requirements.txt ./

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python" , "-u" ,"./test.py"]   // -> Dockerfile

Docker Image can be generated by running Docker build -t imagename . However, when running docker run imagename, no csv file is generated.

Would like to seek help in locating the sample_output.csv file after running the Docker container based on the docker image.

//test.py

import pandas as pd
import numpy as np
import os 
from scipy.stats import uniform, exponweib
from scipy.special import gamma
from scipy.optimize import curve_fit


N_SUBSYS = 30
STEPS = 100
LIMIT = 101*STEPS
times = np.arange(0, LIMIT, STEPS)

if not os.path.exists('./output'):
    os.mkdir('./output')

    print("Directory")

class WeibullFailure():
    def __init__(self):
        N_TRAINS = 92
        LOWER_BETA = 0.9
        RANGE_BETA = 0.3
        LOWER_LOGSCALE = 4
        RANGE_LOGSCALE = 1.5
        LOWER_SIZE = 4
        RANGE_SIZE = 8
        gensize = N_TRAINS * int(uniform.rvs(LOWER_SIZE, RANGE_SIZE))
        genbeta = uniform.rvs(LOWER_BETA, RANGE_BETA)
        genscale = np.power(10, uniform.rvs(LOWER_LOGSCALE, RANGE_LOGSCALE))
        self.beta = genbeta
        self.eta = genscale
        self.size = gensize

    def generate_failures(self):
        return exponweib.rvs(
            a=1, loc=0, c=self.beta, scale=self.eta, size=self.size
        )

    def __repr__(self):
        string = f"Subsystem ~ ({self.size} Instances)"
        string += f" Weibull({self.eta:.2f}, {self.beta:.4f})"
        return string


def get_cumulative_failures(failure_times, times):
    cumulative_failures = {
        i: np.histogram(ft, times)[0].cumsum()
        for i, ft in failure_times.items()
    }
    cumulative_failures = pd.DataFrame(cumulative_failures, index=times[1:])
    return cumulative_failures


def fit_failures(cumulative_failures, subsystems):
    fitted = {}
    for i, x in cumulative_failures.items():
        size = subsystems[i].size
        popt, _ = curve_fit(
            lambda x, a, b: np.exp(a)*np.power(x, b), x.index, x.values
        )
        fitted[i] = (np.exp(-popt[0]/popt[1])*size, popt[1])
    return fitted


def kl_divergence(p1, p2):
    em_constant = 0.57721  # Euler-Mascheroni constant
    eta1, beta1 = p1
    eta2, beta2 = p2
    e11 = np.log(beta1/np.power(eta1, beta1))
    e12 = np.log(beta2/np.power(eta2, beta2))
    e2 = (beta1 - beta2)*(np.log(eta1) - em_constant/beta1)
    e3 = np.power(eta1/eta2, beta2)*gamma(beta2/beta1 + 1) - 1
    divergence = e11 - e12 + e2 + e3
    return divergence


subsystems = {i: WeibullFailure() for i in range(N_SUBSYS)}
failure_times = {i: s.generate_failures() for i, s in subsystems.items()}

cumulative_failures = get_cumulative_failures(failure_times, times)
fitted = fit_failures(cumulative_failures, subsystems)
divergences = {
    i: kl_divergence(f, [subsystems[i].eta, subsystems[i].beta])
    for i, f in fitted.items()
}

expected_failures = {i: np.power(times[1:]/s.eta, s.beta)*s.size
                     for i, s in subsystems.items()}
expected_failures = pd.DataFrame(expected_failures, index=times[1:])

modeled_failures = {i: np.power(times[1:]/f[0], f[1])*subsystems[i].size
                    for i, f in fitted.items()}
modeled_failures = pd.DataFrame(modeled_failures, index=times[1:])

cols = ['eta', 'fit_eta', 'beta', 'fit_beta', 'kl_divergence', 'n_instance']
out = pd.concat([
    pd.DataFrame({i: [s.size, s.eta, s.beta] for i, s in subsystems.items()},
                 index=['n_instance', 'eta', 'beta']).T,
    pd.DataFrame(fitted, index=['fit_eta', 'fit_beta']).T,
    pd.Series(divergences, name='kl_divergence')
], axis=1)[cols]
out.to_csv('./output/sample_output.csv')

if not os.path.exists('./output/sample_output.csv'):
    print("Hello")

Upvotes: 0

Answers (2)

anemyte

Reputation: 20206

Docker containers are isolated from the host by definition. When you run something in container, it stays in container.

You can mount host directory into container where you think script output should appear. You can do this with -v (volume) option:

docker run -v /host/path:/container/path ...

Multiple volumes can be specified:

docker run -v /host/path:/container/path -v /another/host/path:/another/container/path ...

After that host directory with all contents will appear in container as it is on host and if your program would add or replace something in there you will see it.

UPD: Looking at the test.py your output file should be in /demo/output so you can mount some host directory there, like, for example, your current directory: docker run -v $(pwd):/demo/output ...

Upvotes: 0

Anton Pomieshchenko

Reputation: 2167

I would recommend you to redirect output to console instead of file something like this:

from io import StringIO
output = StringIO()
out.to_csv(output)
print(output.getvalue())

and than you your container

docker run <container> > output.csv

Upvotes: 2

How to access CSV output from Docker Container?

Answers (2)

Related Questions