deepayan das
deepayan das

Reputation: 1657

How do I load a text file into a pandas dataframe?

I have a text file which looks something like this:

`

 101   the   323
 103   to    324
 104   is    325

where the delimiter is four spaces. I am trying read_csv function inorder to convert it into a pandas data frame.

data= pd.read_csv('file.txt', sep=" ", header = None)

However it is giving me lot of NaN values

    101\tthe\tthe\t10115  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     102\tto\tto\t5491  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     103\tof\tof\t4767  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
       104\ta\ta\t4532  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  Na

Is there any way I can read the text file into a correct csv format.

Upvotes: 1

Views: 20057

Answers (2)

jezrael
jezrael

Reputation: 862581

If need separator exactly 4 whitespaces:

data = pd.read_csv('file.txt', sep="\s{4}", header = None, engine='python')
print (data)
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325

Or use parameter delim_whitespace=True (thanks carthurs) or \s+ if need separator one or more whitespaces:

data = pd.read_csv('file.txt', sep="\s+", header = None)
data = pd.read_csv('file.txt', delim_whitespace=True, header = None)

But if separator is tab:

data = pd.read_csv('file.txt', sep="\t", header = None)

Upvotes: 7

EdChum
EdChum

Reputation: 393993

You have a fixed width file so you can use read_fwf which will just sniff the form of the file:

In[79]:
pd.read_fwf('file.txt', header=None)

Out[79]: 
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325

Upvotes: 3

Related Questions