Reputation: 35
I am trying to read a text file using pd.read_csv
df = pd.read_csv('filename.txt', delimiter = "\t")
My text file (see below) has a few lines of text before the dataset I need to import begins. How do I skip the lines before the dataset headers? I don't want to use any solution that involves counting the number of lines I need to skip because I have to do this for multiple (similar, not same) text files. Any help is appreciated!
Note: I cannot upload the text file as it is confidential
=========================================
hello 123
=========================================
Dir: /x/y/z/RTchoice/release001/data
Date: 17-Mar-2020 10:0:08
Output File: /a/b/c/filename.txt
N: 2842
-----------------------------------------
Subject col1 col2 col3
001 10.00000 1.00000 3.00000
002 11.00000 2.00000 4.00000
Upvotes: 2
Views: 2663
Reputation: 8219
Here is an attempt to 'craft magic'. The idea is to try read_csv
with different skiprows
until it works
import pandas as pd
from io import StringIO
data = StringIO(
'''
=========================================
hello 123
=========================================
Dir: /x/y/z/RTchoice/release001/data
Date: 17-Mar-2020 10:0:08
Output File: /a/b/c/filename.txt
N: 2842
-----------------------------------------
Subject col1 col2 col3
001 10.00000 1.00000 3.00000
002 11.00000 2.00000 4.00000
''')
for n in range(1000):
try:
data.seek(0)
df = pd.read_csv(data, delimiter = "\s+", skiprows=n)
except:
print(f'skiprows = {n} failed (exception)')
else:
if len(df.columns) == 1: # do not let it get away with a single-column df
print(f'skiprows = {n} failed (single column)')
else:
break
print('\n', df)
output:
skiprows = 0 failed (exception)
skiprows = 1 failed (exception)
skiprows = 2 failed (exception)
skiprows = 3 failed (exception)
skiprows = 4 failed (exception)
skiprows = 5 failed (exception)
skiprows = 6 failed (exception)
skiprows = 7 failed (exception)
skiprows = 8 failed (single column)
Subject col1 col2 col3
0 1 10.0 1.0 3.0
1 2 11.0 2.0 4.0
Upvotes: 2