How to read and write mixed metadata-data files using pandas

Question

Scientific data often come with a metadata section before the data section. I would like to read CSV files like the following example where I keep the top 5 rows separated as a metadata 'header' and do calculations on the remainder:

Source: stackoverflow.com

Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.

Date: 17.02.21

col_1	col_2	col_3
a	0	3
b	1	9
c	4	-2

After finishing I would like to write the dataset with the metadata 'header' on top to keep the original file structure.

Source: stackoverflow.com

Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.

Date: 17.02.21

col_1	col_2	col_3	col_4
a	0	3	3
b	1	9	10
c	4	-2	2

Rob Raymond · Accepted Answer

Not sure why you escaped newlines, so I removed in sample data

open file and read contents
take first five rows as meta header information
do a DF manipulation
save results back down to a file. Write meta data first followed by DF contents

from pathlib import Path

filetext = """Source: stackoverflow.com
Citation: stackoverflow et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
,,,
,,,
col_1,col_2,col_3
a,0,3
b,1,9
c,4,-2"""

p = Path.cwd().joinpath("so_science.txt")
with open(p, "w") as f:
    f.write(filetext)

# get file contents
with open(p, "r") as f: fc = f.read()
        
# first five rows are metadata
header = "
".join(fc.split("
")[:5])
# reset is a CSV
df = pd.read_csv(io.StringIO("
".join(fc.split("
")[5:])))
# modify DF
df["col_2"] = df["col_2"] + df["col_3"]

# write out meta-data and CSV
with open(p, "w") as f:
    f.write(f"{header}
")
    df.to_csv(f, index=False)

How to read and write mixed metadata-data files using pandas

Answers (1)

Related Questions