Reputation: 301
I have a text file that looks like :
Date Fruit-type Color count
aug-6 apple green 4
aug-7 pear brown 5
aug-3 peach yellow 10
aug-29 orange orange 34
I would like to parse it to remove the irregular spaces into a nicely formatted pandas dataframe. I thought to remove the spaces and replace them with another delimiter but could not figure out the logic.
Desired output
Date,Fruit-type,Color,count
aug-6,apple,green,4
aug-7,pear,brown,5
aug-3,peach,yellow,10
aug-29,orange,orange,34
Upvotes: 1
Views: 1953
Reputation: 1517
gawk '{gsub(/[[:blank:]]+/, ",")}1' file
Date,Fruit-type,Color,count
aug-6,apple,green,4
aug-7,pear,brown,5
aug-3,peach,yellow,10
aug-29,orange,orange,34
Upvotes: 0
Reputation: 1429
If you can use command line tools, you can run this awk
command to turn it from space delimited to comma delimited.
awk '{for (i=1; i<NF; i++){printf "%s,", $i} print $NF}' data.txt
Otherwise, pandas can import space delimited files easily.
import pandas as pd
frame = pd.read_table('data.txt', sep='\s+')
With data.txt as:
Date Fruit-type Color count
aug-6 apple green 4
aug-7 pear brown 5
aug-3 peach yellow 10
aug-29 orange orange 34
The output is
Date Fruit-type Color count
0 aug-6 apple green 4
1 aug-7 pear brown 5
2 aug-3 peach yellow 10
3 aug-29 orange orange 34
You can read more here: http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files
Upvotes: 5