Rakesh Adhikesavan
Rakesh Adhikesavan

Reputation: 12826

How to read a textfile, that is delimited by whitespaces, into a DataFrame?

I have a text file that is formatted this way:

A00     0010  00000
A001    0011  00000
A00911  0019  00000
A0100   0020  10000

I want to read this file into a DataFrame. So I tried:

import pandas as pd
path = *file path*
df = pd.read_csv(path, sep = '\t', header = None)

What I got was a DataFrame with 4 rows and one column.

                         0
0      A00     0010  00000
1      A001    0011  00000
2      A00911  0019  00000
3      A0100   0020  10000

[4 rows x 1 columns]

This is because the values are not seperated by "\t". The number of spaces between the columns vary in each row depending on the length of the string.

The desired DataFrame should have four rows and three columns.

          0       1      2  
0      A000    0010  00000
1      A001    0011  00000
2      A009    0019  00000
3      A0100   0020  10000

[4 rows x 3 columns]

Upvotes: 1

Views: 252

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29711

You could supply delim_whitespace=True along with dtype=str to preserve the dtypes args in read_csv, like:

df = pd.read_csv(path, delim_whitespace=True, header=None, dtype=str)
df

Image

Upvotes: 5

Mohammad Athar
Mohammad Athar

Reputation: 1980

try to use regex in the "sep" command

df = pd.read_csv(path, sep = ' +', header = None)

Upvotes: 1

Related Questions