Reputation: 1145
Scientific data often come with a metadata section before the data section. I would like to read CSV files like the following example where I keep the top 5 rows separated as a metadata 'header' and do calculations on the remainder:
Source: stackoverflow.com
Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
col_1 | col_2 | col_3 |
---|---|---|
a | 0 | 3 |
b | 1 | 9 |
c | 4 | -2 |
After finishing I would like to write the dataset with the metadata 'header' on top to keep the original file structure.
Source: stackoverflow.com
Citation: StackOverflow et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
col_1 | col_2 | col_3 | col_4 |
---|---|---|---|
a | 0 | 3 | 3 |
b | 1 | 9 | 10 |
c | 4 | -2 | 2 |
Upvotes: 0
Views: 429
Reputation: 31206
Not sure why you escaped newlines, so I removed in sample data
from pathlib import Path
filetext = """Source: stackoverflow.com
Citation: stackoverflow et al. 2021: How to import and export mixed metadata - data files using pandas.
Date: 17.02.21
,,,
,,,
col_1,col_2,col_3
a,0,3
b,1,9
c,4,-2"""
p = Path.cwd().joinpath("so_science.txt")
with open(p, "w") as f:
f.write(filetext)
# get file contents
with open(p, "r") as f: fc = f.read()
# first five rows are metadata
header = "\n".join(fc.split("\n")[:5])
# reset is a CSV
df = pd.read_csv(io.StringIO("\n".join(fc.split("\n")[5:])))
# modify DF
df["col_2"] = df["col_2"] + df["col_3"]
# write out meta-data and CSV
with open(p, "w") as f:
f.write(f"{header}\n")
df.to_csv(f, index=False)
Upvotes: 1