Reputation: 967

Convert a space delimited file to comma separated values file in python

I am very new to Python. I know that this has already been asked, and I apologise, but the difference in this new situation is that spaces between strings are not equal. I have a file, named coord, that contains the following space delimited strings:

   1  C       6.00    0.000000000    1.342650315    0.000000000
   2  C       6.00    0.000000000   -1.342650315    0.000000000
   3  C       6.00    2.325538562    2.685300630    0.000000000
   4  C       6.00    2.325538562   -2.685300630    0.000000000
   5  C       6.00    4.651077125    1.342650315    0.000000000
   6  C       6.00    4.651077125   -1.342650315    0.000000000
   7  C       6.00   -2.325538562    2.685300630    0.000000000
   8  C       6.00   -2.325538562   -2.685300630    0.000000000
   9  C       6.00   -4.651077125    1.342650315    0.000000000
  10  C       6.00   -4.651077125   -1.342650315    0.000000000
  11  H       1.00    2.325538562    4.733763602    0.000000000
  12  H       1.00    2.325538562   -4.733763602    0.000000000
  13  H       1.00   -2.325538562    4.733763602    0.000000000
  14  H       1.00   -2.325538562   -4.733763602    0.000000000
  15  H       1.00    6.425098097    2.366881801    0.000000000
  16  H       1.00    6.425098097   -2.366881801    0.000000000
  17  H       1.00   -6.425098097    2.366881801    0.000000000
  18  H       1.00   -6.425098097   -2.366881801    0.000000000

Please, note the spaces before the start of each string in the first column. So I have tried the following in order of converting it to csv:

with open('coord') as infile, open('coordv', 'w') as outfile:
    outfile.write(infile.read().replace("  ", ", "))

# Unneeded columns are deleted from the csv

input = open('coordv', 'rb')
output = open('coordcsvout', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    if row:
        writer.writerow(row)
input.close()
output.close()

with open("coordcsvout","rb") as source:
    rdr= csv.reader( source )
    with open("coordbarray","wb") as result:
        wtr= csv.writer(result)
        for r in rdr:
            wtr.writerow( (r[5], r[6], r[7]) )

When I run the script, I get the following for the coordv in the very first part of the script, which is of course very wrong:

,  1, C, , ,  6.00, , 0.000000000, , 1.342650315, , 0.000000000
,  2, C, , ,  6.00, , 0.000000000,  -1.342650315, , 0.000000000
,  3, C, , ,  6.00, , 2.325538562, , 2.685300630, , 0.000000000
,  4, C, , ,  6.00, , 2.325538562,  -2.685300630, , 0.000000000
,  5, C, , ,  6.00, , 4.651077125, , 1.342650315, , 0.000000000
,  6, C, , ,  6.00, , 4.651077125,  -1.342650315, , 0.000000000
,  7, C, , ,  6.00,  -2.325538562, , 2.685300630, , 0.000000000
,  8, C, , ,  6.00,  -2.325538562,  -2.685300630, , 0.000000000
,  9, C, , ,  6.00,  -4.651077125, , 1.342650315, , 0.000000000
, 10, C, , ,  6.00,  -4.651077125,  -1.342650315, , 0.000000000
, 11, H, , ,  1.00, , 2.325538562, , 4.733763602, , 0.000000000
, 12, H, , ,  1.00, , 2.325538562,  -4.733763602, , 0.000000000
, 13, H, , ,  1.00,  -2.325538562, , 4.733763602, , 0.000000000
, 14, H, , ,  1.00,  -2.325538562,  -4.733763602, , 0.000000000
, 15, H, , ,  1.00, , 6.425098097, , 2.366881801, , 0.000000000
, 16, H, , ,  1.00, , 6.425098097,  -2.366881801, , 0.000000000
, 17, H, , ,  1.00,  -6.425098097, , 2.366881801, , 0.000000000
, 18, H, , ,  1.00,  -6.425098097,  -2.366881801, , 0.000000000

I have tried different possibilities in .replace without any success, and so far I haven't found any source of information on how I could do this. What would be the best way to get a comma-separated values from this coord file? What I'm interested is in using then the csv module in python to choose columns 4:6 and finally use numpy to import them as follows:

from numpy import genfromtxt
cocmatrix = genfromtxt('input', delimiter=',')

I'd be very glad if somebody could help me with this problem.

Upvotes: 15

Answers (8)

Ranjeet R Patil

Reputation: 491

For Merging Multiple text files in one CSV

import csv
import os
for x in range(0,n):            #n = max number of files 
    with open('input{}.txt'.format(x)) as fin, open('output.csv', 'a') as fout:
       csv_output=csv.writer(fout)
       for line in fin:
            csv_output.writerow(line.split())

Upvotes: 1

Majid Hoseiny

Reputation: 21

for converting "space" to ","

only fill the filename to what you want

with open('filename') as infile, open('output', 'w') as outfile:
    outfile.write(infile.read().replace(" ", ","))

for converting "," to "Space"

with open('filename') as infile, open('output', 'w') as outfile: outfile.write(infile.read().replace(",", " "))

Upvotes: 2

j011y

Reputation: 111

replace your first bit with this. it's not super pretty but it will give you a csv format.

with open('coord') as infile, open('coordv', 'w') as outfile:
    for line in infile:
        outfile.write(" ".join(line.split()).replace(' ', ','))
        outfile.write(",") # trailing comma shouldn't matter

if you want the outfile to have everything on different lines you could add outfile.write("\n") at the end of the for loop, but i dont think your code that follows this will work with it like that.

Upvotes: 7

dstromberg

Reputation: 7167

The csv module is good, or here's a way to do it without:

#!/usr/local/cpython-3.3/bin/python

with open('input-file.csv', 'r') as infile, open('output.csv', 'w') as outfile:
    for line in infile:
        fields = line.split()
        outfile.write('{}\n'.format(','.join(fields)))

Upvotes: 0

user1667218

Reputation: 49

>>> a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'
=>  a = 'cah  1  C       6.00    0.000000000    1.342650315    0.000000000'

>>> a.split()
=>  ['cah', '1', 'C', '6.00', '0.000000000', '1.342650315', '0.000000000']

>>> ','.join(a.split())
=>  'cah,1,C,6.00,0.000000000,1.342650315,0.000000000'

>>> ['"' + x + '"' for x in a.split()]
=>  ['"cah"', '"1"', '"C"', '"6.00"', '"0.000000000"', '"1.342650315"', '"0.000000000"']

>>> ','.join(['"' + x + '"' for x in a.split()]
=>  '"cah","1","C","6.00","0.000000000","1.342650315","0.000000000"'

Upvotes: 1

user1667218

Reputation: 49

Why not to read a file line by line? Split a line into a list then rejoin a list with ','.

Upvotes: 0

Daniel

Reputation: 19547

You can use python pandas, I have written your data to data.csv:

import pandas as pd
>>> df = pd.read_csv('data.csv',sep='\s+',header=None)
>>> df
     0  1  2         3         4  5
0    1  C  6  0.000000  1.342650  0
1    2  C  6  0.000000 -1.342650  0
2    3  C  6  2.325539  2.685301  0
3    4  C  6  2.325539 -2.685301  0
4    5  C  6  4.651077  1.342650  0
5    6  C  6  4.651077 -1.342650  0
...

The great thing about this is to access the underlying numpy array you can use df.values:

>>> type(df.values)
<type 'numpy.ndarray'>

To save the data frame with comma delimiters:

>>> df.to_csv('data_out.csv',header=None)

Pandas is a great library for managing large amounts of data, as a bonus it works well with numpy. There is also a very good chance that this will be much faster then using the csv module.

Upvotes: 9

the wolf

Reputation: 35522

You can use csv:

import csv

with open(ur_infile) as fin, open(ur_outfile, 'w') as fout:
    o=csv.writer(fout)
    for line in fin:
        o.writerow(line.split())

Upvotes: 17

Convert a space delimited file to comma separated values file in python

Answers (8)

For Merging Multiple text files in one CSV

for converting "space" to ","

for converting "," to "Space"

Related Questions