Michele
Michele

Reputation: 1

Plot from multiple files imported with glob

I need to process hundreds of data files and I want to plot the results in a single graph. I'm using glob with a for loop to read and store the data, but I have no idea how to plot them with plotly.

    import pandas as pd
    import plotly.express as px
    import plotly.graph_objects as go
    import plotly.io as pio
    import glob

    pio.renderers.default = 'browser'

    files = glob.glob('GIRS12_L_8V_0.95bar.*')

    traces = []

    for file in files:
        dat = pd.read_csv(file, sep='  ')
        dat.columns = ['time','v(t)'] 

        fig = go.Figure()
        traces.append(go.Scatter(x = dat['time'], y = dat['v(t)']))

    px.scatter(data_frame = traces)

Is it right to call px.scatter(...)? I was using fig.show() at the end but I don't know why it does not show anything in the graph.

Upvotes: 0

Views: 268

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

  • have generated 100s of CSVs to demonstrate
  • pathlib is more pythonic / OO approach to interacting with file system and hence glob()
  • simplest approach with plotly is to use Plotly Express to generate all of the traces. Have taken approach of preparing all data into a single pandas data frame to make this super simple
  • per comments, a figure with so many traces and hence such a long legend may not be best visualisation for what you are trying to achieve. Consider what you need to visualise and tune solution to achieve a better visualisation
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.express as px

# location where files exist
p = Path.cwd().joinpath("SO_csv")
if not p.is_dir():
    p.mkdir()
# generate 100s of files
for i in range(400):
    pd.DataFrame(
        {
            "time": pd.date_range("00:00", freq="30min", periods=47),
            "v(t)": pd.Series(np.random.uniform(1, 5, 47)).sort_values(),
        }
    ).to_csv(p.joinpath(f"GIRS12_L_8V_0.95bar.{i}"), index=False)

# read and concat all the CSVs into one dataframe, creating additional column that is the filename
# scatter this dataframe, a scatter / color per CSV
px.scatter(
    pd.concat(
        [pd.read_csv(f).assign(name=f.name) for f in p.glob("GIRS12_L_8V_0.95bar.*")]
    ),
    x="time",
    y="v(t)",
    color="name",
)

enter image description here

Upvotes: 1

Related Questions