Reputation: 13
I got this .log file. I don't know how to read them as DataFrame
id | create_date
-----+----------------------------
318 | 2017-05-05 07:03:27.556697
456 | 2017-07-03 01:50:07.966652
249 | 2017-05-03 13:57:32.567373
Upvotes: 0
Views: 5099
Reputation: 5294
pd.read_table("data.csv", sep="|", skiprows=[1], header=0, parse_dates=[1]).rename(columns=lambda x: x.strip())
id create_date
0 318 2017-05-05 07:03:27.556697
1 456 2017-07-03 01:50:07.966652
2 249 2017-05-03 13:57:32.567373
sep="|"
Use |
as column separator
skiprows=[1]
Ignore the second row, which is just decorations and would be the most problematic to parse
header=0
Read column names from the first row
parse_dates=[1]
Convert create_date
column into pandas datetime64
format (may be optional)
rename(columns=lambda x: x.strip())
Remove extra whitespaces from column names
You may want to add index_col=0
if you want to make id
column your index instead of using a sequential one.
Upvotes: 2
Reputation: 11192
try this,
df=pd.read_csv('file_.csv',sep='|')
then you can remove -----+----------------------------
in many ways
df[df[' id ']!='-----+----------------------------']
df[~df[' id '].str.startswith('-')]
df.drop(0)
# it won't work if your file contains -----+----------------------------
in any other places for example footer
df[df[' create_date '].notnull()]
# it won't work when your create_date column contains NaN by default.Output:
id create_date
1 318 2017-05-05 07:03:27.556697
2 456 2017-07-03 01:50:07.966652
3 249 2017-05-03 13:57:32.567373
Upvotes: 0