user3609179
user3609179

Reputation: 301

Parsing an irregularly spaced text file in Python pandas

I have a text file that looks like :

Date     Fruit-type  Color         count
aug-6     apple  green         4
aug-7     pear  brown         5
aug-3     peach  yellow         10
aug-29     orange  orange         34

I would like to parse it to remove the irregular spaces into a nicely formatted pandas dataframe. I thought to remove the spaces and replace them with another delimiter but could not figure out the logic.

Desired output

Date,Fruit-type,Color,count
aug-6,apple,green,4
aug-7,pear,brown,5
aug-3,peach,yellow,10
aug-29,orange,orange,34

Upvotes: 1

Views: 1953

Answers (2)

Claes Wikner
Claes Wikner

Reputation: 1517

gawk '{gsub(/[[:blank:]]+/, ",")}1' file

Date,Fruit-type,Color,count
aug-6,apple,green,4
aug-7,pear,brown,5
aug-3,peach,yellow,10
aug-29,orange,orange,34

Upvotes: 0

alexbclay
alexbclay

Reputation: 1429

If you can use command line tools, you can run this awk command to turn it from space delimited to comma delimited.

awk '{for (i=1; i<NF; i++){printf "%s,", $i} print $NF}' data.txt

Otherwise, pandas can import space delimited files easily.

import pandas as pd

frame = pd.read_table('data.txt', sep='\s+')

With data.txt as:

Date     Fruit-type  Color         count
aug-6     apple  green         4
aug-7     pear  brown         5
aug-3     peach  yellow         10
aug-29     orange  orange         34

The output is

     Date Fruit-type   Color  count
0   aug-6      apple   green      4
1   aug-7       pear   brown      5
2   aug-3      peach  yellow     10
3  aug-29     orange  orange     34

You can read more here: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files

Upvotes: 5

Related Questions