Reputation: 31
I am trying to read a tab-separated text file using pandas. The file looks like this:
14.38 14.21 0.8951 5.386 3.312 2.462 4.956 1
14.69 14.49 0.8799 5.563 3.259 3.586 5.219 1
14.11 14.12 0.8911 5.422 3.302 2.723 5 1
Some lines have extra tabs. If I use read_csv
or read_fwf
and specify sep='\t'
. I get results that look like this:
0 15.26\t14.84\t0.871\t5.763\t3.312\t2.221\t5.22\t1
1 14.88\t14.57\t0.8811\t5.554\t3.333\t1.018\t4.9
Do you have any suggestions as to what parameters I could specify to deal with this problem?
Solution:
use pd.read_csv(filename, delim_whitespace=True)
Upvotes: 3
Views: 6797
Reputation: 38415
Pandas read_csv is very versatile, you can use it with delim_whitespace = True to handle variable number of whitespaces.
df = pd.read_csv(filename, delim_whitespace=True)
Option 2: Use separator argument
df = pd.read_csv(filename, sep='\t+')
Upvotes: 7
Reputation: 3802
If I use this code:
import pandas as pd
parsed_csv_txt = pd.read_csv("tabbed.txt",sep="\t")
print(parsed_csv_txt)
On this file:
a b c d e
14.69 2452 982 234 12
14.11 5435 234 12
16.63 1 12 66
I get:
a b c d e
0 14.69 2452 982.0 234.0 12
1 14.11 5435 234.0 NaN 12
2 16.63 1 NaN 12.0 66
Are there any issues with the output that we see here?
If you would like a different output such as:
a b c d e
0 14.69 2452 982 234 12.0
1 14.11 5435 234 12 NaN
2 16.63 1 12 66 NaN
Use this code:
import pandas as pd
parsed_csv_txt = pd.read_csv("tabbed.txt",delim_whitespace=True)
print(parsed_csv_txt)
Note
For a longer discussion around the topic of variable amounts of whitespace between values check out this discussion: Can pandas handle variable-length whitespace as column delimiters
Upvotes: 0