Reputation: 5767
I have some input files that look something like this:
GENE CHR START STOP NSNPS NPARAM N ZSTAT P
2541473 1 1109286 1133315 2 1 15000 3.8023 7.1694e-05
512150 1 1152288 1167447 1 1 15000 3.2101 0.00066347
3588581 1 1177826 1182102 1 1 15000 3.2727 0.00053256
I am importing the file like this:
df = pd.read_csv('myfile.out', sep='\t')
But all the data gets read into a single column. I have tried changing the file format to encoding='utf-8'
, encoding='utf-16-le'
, encoding='utf-16-be'
but this does not work. Separating by sep=' '
will separate the data into too many columns, but it will separate. Is there a way to correctly read in this data?
Upvotes: 0
Views: 1408
Reputation:
Try using \s+
(which reads as "one or more whitespace characters") as your delimiter:
df = pd.read_csv('myfile.out', sep='\s+')
Upvotes: 3