read_csv stops reading some lines eventhough with the same formats

Question

I have a csv file with 4 lines with exactly same formatting. while reading the csv with panda it does not read all the lines. I am not able to figure out why ? as the formats are the same.Plz help. Listed below:

tmp_csv_outfile:
6801 2017/09/28 18:56:51.390624 129.1972 107 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp]0022 > f4 37 3e 00 00 
6802 2017/09/28 18:56:51.390640 129.1972 108 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp] TEST: ~Finished Testcase: TEST0471
6803 2017/09/28 18:56:51.390646 129.1973 109 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [dia] trigger received - resetting session timeout 5000
6804 2017/09/28 18:56:51.390652 129.1975 110 XXX1 YYYY ZZZZ 908 log info verbose 1 908 :: 235 :: [dia][th1] Diagnosis Core responded, sending to the th1 Adapter (allConnected = 0)



df = pd.read_csv(tmp_csv_outfile,names=["Data"],header=None,sep='\s\s+$',engine='python')
print df.tail(3)

output

                                                Data
0  6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1  6802 2017/09/28 18:56:51.390640 129.1972 108 X...

SOLUTION SOVLED

After a long digging in I found the solution at https://github.com/pandas-dev/pandas/issues/16893

After a update of the pandas it starts working fine. Thanks @ jezrael for valuable inputs.

jezrael · Accepted Answer

I think problem is with separator, so change it to some value which is not in data:

df = pd.read_csv(tmp_csv_outfile, names=["Data"], sep='¥‎', engine='python')
print (df)

                                                Data
0  6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1  6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2  6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3  6804 2017/09/28 18:56:51.390652 129.1975 110 X...

EDIT:

With real data for me working nice:

df = pd.read_csv('faulty.csv', sep='|', names=['Data'])
print (df)
                                                Data
0  6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1  6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2  6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3  6804 2017/09/28 18:56:51.390652 129.1975 110 X...

read_csv stops reading some lines eventhough with the same formats

Answers (1)

Related Questions