pandas read_csv skiprows - determine rows to skip

Question

below is a csv snippet with some dummy headers while the actual frame anchored by beerId:

This work is an unpublished, copyrighted work and contains confidential information.  
beer consumption    
consumptiondate 7/24/2018
consumptionlab  H1
numbeerssuccessful  40
numbeersfailed  0
totalnumbeers   40
consumptioncomplete TRUE

beerId  Book
341027  Northern Light

this df = pd.read_csv(path_csv, header=8) code works, but the issue is that header is not always in 8 depending on a day. cannot figure out how to use lambda from help as in

skiprows : list-like or integer or callable, default None

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

to find the index row of beerId

jezrael · Accepted Answer

I think need preprocessing first:

path_csv = 'file.csv'
with open(path_csv) as f:
    lines = f.readlines()
    #get list of all possible lins starting by beerId
    num = [i for i, l in enumerate(lines) if l.startswith("beerId" )]
    #if not found value return 0 else get first value of list subtracted by 1
    num = 0 if len(num) == 0 else num[0] - 1
    print (num)
    8


df = pd.read_csv(path_csv, header=num)
print (df)
             beerId  Book
0  341027  Northern Light

pandas read_csv skiprows - determine rows to skip

Answers (1)

Related Questions