Reputation: 53
I have a csv file with 4 lines with exactly same formatting. while reading the csv with panda it does not read all the lines. I am not able to figure out why ? as the formats are the same.Plz help. Listed below:
tmp_csv_outfile:
6801 2017/09/28 18:56:51.390624 129.1972 107 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp]0022 > f4 37 3e 00 00
6802 2017/09/28 18:56:51.390640 129.1972 108 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [tp] TEST: ~Finished Testcase: TEST0471
6803 2017/09/28 18:56:51.390646 129.1973 109 XXX1 YYYY ZZZZ 908 log warn verbose 1 908 :: 235 :: [dia] trigger received - resetting session timeout 5000
6804 2017/09/28 18:56:51.390652 129.1975 110 XXX1 YYYY ZZZZ 908 log info verbose 1 908 :: 235 :: [dia][th1] Diagnosis Core responded, sending to the th1 Adapter (allConnected = 0)
df = pd.read_csv(tmp_csv_outfile,names=["Data"],header=None,sep='\s\s+$',engine='python')
print df.tail(3)
output
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
SOLUTION SOVLED
After a long digging in I found the solution at https://github.com/pandas-dev/pandas/issues/16893
After a update of the pandas it starts working fine. Thanks @ jezrael for valuable inputs.
Upvotes: 0
Views: 322
Reputation: 863801
I think problem is with separator, so change it to some value which is not in data:
df = pd.read_csv(tmp_csv_outfile, names=["Data"], sep='¥', engine='python')
print (df)
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2 6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3 6804 2017/09/28 18:56:51.390652 129.1975 110 X...
EDIT:
With real data for me working nice:
df = pd.read_csv('faulty.csv', sep='|', names=['Data'])
print (df)
Data
0 6801 2017/09/28 18:56:51.390624 129.1972 107 X...
1 6802 2017/09/28 18:56:51.390640 129.1972 108 X...
2 6803 2017/09/28 18:56:51.390646 129.1973 109 X...
3 6804 2017/09/28 18:56:51.390652 129.1975 110 X...
Upvotes: 1