Reputation: 925
This is the question continous from my previous question. Thank to many people, I could modify my code as below.
import csv
with open("SURFACE2", "rb") as infile, open("output.txt", "wb") as outfile:
reader = csv.reader(infile, delimiter=" ")
writer = csv.writer(outfile, delimiter=" ")
for row in reader:
row[18] = "999"
writer.writerow(row)
I just change delimiter from "\t" to " ". Whiel with previous delimiter, the code only worked upto row[0], with " " the code can work until row[18].
15.20000 120.60000 98327 get data information here. SURFACE DATA FROM ??????????? SOURCE FM-12 SYNOP 155.00000 1 0 0 0 0 T F F -888888 -888888 20020601030000 100820.00000
From the data line above, row[18] is just in the middle between 15.20000 and 120.60000.
I am not sure what happens in between these two values. Maybe delimiter changes? However visually I can't notice any difference. Is there any way which I can know the delimiter changed and if so, do you have any idea to handle multiple delimiter for one code?
Any idea or help would be really appreciated.
Thank you, Isaac
The results from repr(next(infile)):
' 15.20000 120.60000 98327 get data information here. SURFACE DATA FROM ??????????? SOURCE FM-12 SYNOP 155.00000 1 0 0 0 0 T F F -888888 -888888 20020601030000 100820.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
' 99070.00000 0 155.00000 0 303.20001 0 297.79999 0 3.00000 0 140.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
'-777777.00000 0-777777.00000 0 1.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
' 1 0 0\n'
' 55.10000 -3.60000 03154 get data information here. SURFACE DATA FROM ??????????? SOURCE FM-12 SYNOP 16.00000 1 0 0 0 0 T F F -888888 -888888 20020601030000-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
'-888888.00000 0 16.00000 0 281.20001 0 279.89999 0 0.00000 0 0.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
'-777777.00000 0-777777.00000 0 1.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0\n'
' 1 0 0\n'
As you can see actually four first lines should be one line. For some reason, full line seems divided into 4 parts. Do you have any idea? Thank you, Isaac
Upvotes: 0
Views: 582
Reputation: 87064
N.B. The file format is discussed on page 19 of this document. This more-or-less agrees with the sample data.
EDIT
OK, after considering the various comments, additional answers, and reading the original question it would seem that the file in question is not a CSV file. It is weather observation data formatted as "little_r" which uses fixed width fields padded with spaces. There is not much info available so I'm guessing, but each group of 4 lines seem to comprise a single observation. From your previous question it seems that you want to update the 3rd column in the first line? The other 3 lines would be skipped. Then update the 3rd column in the first line of the next set of 4 lines, etc., etc.
An example from the OP:
15.20000 120.60000 98327 get data information here. SURFACE DATA FROM ??????????? SOURCE FM-12 SYNOP 155.00000 1 0 0 0 0 T F F -888888 -888888 20020601030000 100820.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0 99070.00000 0 155.00000 0 303.20001 0 297.79999 0 3.00000 0 140.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0 -777777.00000 0-777777.00000 0 1.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0-888888.00000 0 1 0 0
The first 2 columns of the first line are (I'm guessing) the latitude and longitude for the observations. I have no idea what the 3rd column 98327
is, but this is the column that the OP wants to update (based on previous question).
It's not a CSV file, so don't process it as one. Instead, because there are fixed width fields, we know the offset and width of the field that needs to be updated. Based on the sample data the 3rd column occupies characters 41-46. So, to update the data and write to a new file:
offset_col_3 = 41
length_col_3 = 5
with open('SURFACE2') as infile, open('output.txt', 'w') as outfile:
for line_no, line in enumerate(infile):
if line_no % 4 == 0: # every 4th line starting with the first
line = '{}{:>5}{}'.format(line[:offset_col_3], 999, line[offset_col_3+length_col_3:])
outfile.write(line)
Original answer
Try reading line 20 (row[19]) (assuming no header line in the CSV file, otherwise line 21) from the file and inspecting it in Python:
with open("SURFACE2") as infile:
for i in range(20):
print repr(next(infile))
The last line displayed will be row 18. If, for example, tabs are delimiters then you might see \t
in between the columns of data. Compare the previous line to the last line to see if there is a difference in the delimiter used.
If you find that your CSV file is mixing delimiters, then you might have to split the fields manually.
Upvotes: 2
Reputation: 6684
The csv
module is not the right tool to use when you have fixed-width fields in your file. What you need to do is explicitly use the field lengths to split up the lines. For example:
# This would be your whole file
data = "\n".join([
"abc def gh i",
"jk lm n o",
"p q r s",
])
field_widths = [5, 4, 3, 1]
def fields(line, field_widths):
pos = 0
for length in field_widths:
yield line[pos:pos + length].strip()
pos += length
for line in data.split("\n"):
print(list(fields(line, field_widths)))
will give you:
['abc', 'def', 'gh', 'i']
['jk', 'lm', 'n', 'o']
['p', 'q', 'r', 's']
Upvotes: 1