Reputation: 919
So I have many csv files which I have to read into a dataframe. Only problem is that they all have a description and metadata in the first 4 lines like this:
#Version: 1.0
#Date: 2006-11-02 00:00:08
After these, I have a normal csv data. How to deal with this? I could remove them manually, only problem is that i have too many such files.
Upvotes: 1
Views: 457
Reputation: 1873
use skip_rows
parameter of pd.read_csv()
.
According to documentation:
skip_rows
: Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. So call it like this:
df = pd.read_csv("path_tocsv.csv", skip_rows= lambda x: x in [0, 1, 2, 3])
The advantage of this is that this way is we can determine which rows to skip and which to not. Otherwise simple passing skip_rows=4
skips first 4 rows.
Upvotes: 1
Reputation: 4648
Simply skip the first 4 rows:
df = pd.read_csv("/path/to/file", skiprows=4)
Upvotes: 1