student
student

Reputation: 21

Python write a new csv file by filtering selected rows from an existing csv file

just a question, I was trying to write selected rows from a .csv file to a new .csv file, but there is an error.

The test.csv file that I was trying to read is like this(two columns):

2013-9     1
2013-10    2
2013-11    3
2013-12    4
2014-1     5
2014-2     6
2014-3     7
2014-4     8
2014-5     9

Since I only want year 2014, here is my code:

import re
import csv

write_flag=0
string_storage=[]
rad_file=open('year.csv')

for rad_line in rad_file:
        if write_flag==1:
                string_storage.append(rad_line)
        if (rad_line.has_key('2014')):
                write_flag=1
        if (rad_line.has_key('2013')):
                write_flag=0
rad_file.close()

out_file = open("try.csv","w")
for temp_string in string_storage:
    out_file.write(temp_string)
out_file.close()

however, the error is: AttributeError: 'str' object has no attribute 'has_key'

Have no idea of the correct way to program this, please help me who is a new python user Thanks

Upvotes: 0

Views: 2857

Answers (3)

unutbu
unutbu

Reputation: 879093

The error could be "fixed" by changing has_key to startswith, but more importantly, the way the program is currently written, you'll skip the first line which starts with 2014, and include the first line of subsequent groups that starts with 2013. Is that really what you want?

If instead you simply want to keep all lines that begin with 2014, then:

with open('year.csv') as rad_file, open("try.csv","w") as out_file:
    header = next(rad_file)
    out_file.write(header)
    for rad_line in rad_file:
        if rad_line.startswith('2014'):
            out_file.write(rad_line)

By processing each line as they are read, you avoid accumulating lines in the list string_storage, thus saving memory. That can be important when processing large files.


Also, if you use a with-statement to open your files, then the file will be automatically closed for you when the flow of execution leaves the with-statement.


Note that in Python2, dicts have a has_key method to check if the dict has a certain key.

The code raised an error because rad_line is a string not a dict.

The has_key method was removed in Python3. In modern versions of Python2 such as Python2.7, you never need to use has_key since key in dict is preferred over dict.has_key(key).

Upvotes: 2

Burhan Khalid
Burhan Khalid

Reputation: 174614

Since you are using the csv module anyway, why not write the file as you are reading it in:

import csv

with open('in.csv', 'r') as i, open('out.csv', 'w') as o:
   r = csv.reader(i, delimiter='\t')
   w = csv.writer(o, delimiter='\t')
   for row in r:
      if row[0].split('-')[0] == '2014':
          w.write(row)

Upvotes: 3

piokuc
piokuc

Reputation: 26164

Use string.find or regular expressions to find a substring in a string.

So instead of

if (rad_line.has_key('2014')):

you can do:

if (rad_line.find('2014') <> -1):

Upvotes: 1

Related Questions