mati
mati

Reputation: 1145

How to read and write mixed metadata-data files using pandas

Scientific data often come with a metadata section before the data section. I would like to read CSV files like the following example where I keep the top 5 rows separated as a metadata 'header' and do calculations on the remainder:

Source: stackoverflow.com

Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.

Date: 17.02.21

col_1 col_2 col_3
a 0 3
b 1 9
c 4 -2

After finishing I would like to write the dataset with the metadata 'header' on top to keep the original file structure.

Source: stackoverflow.com

Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.

Date: 17.02.21

col_1 col_2 col_3 col_4
a 0 3 3
b 1 9 10
c 4 -2 2

Upvotes: 0

Views: 429

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31206

Not sure why you escaped newlines, so I removed in sample data

  • open file and read contents
  • take first five rows as meta header information
  • do a DF manipulation
  • save results back down to a file. Write meta data first followed by DF contents
from pathlib import Path

filetext = """Source: stackoverflow.com
Citation: stackoverflow et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
,,,
,,,
col_1,col_2,col_3
a,0,3
b,1,9
c,4,-2"""

p = Path.cwd().joinpath("so_science.txt")
with open(p, "w") as f:
    f.write(filetext)

# get file contents
with open(p, "r") as f: fc = f.read()
        
# first five rows are metadata
header = "\n".join(fc.split("\n")[:5])
# reset is a CSV
df = pd.read_csv(io.StringIO("\n".join(fc.split("\n")[5:])))
# modify DF
df["col_2"] = df["col_2"] + df["col_3"]

# write out meta-data and CSV
with open(p, "w") as f:
    f.write(f"{header}\n")
    df.to_csv(f, index=False)

Upvotes: 1

Related Questions