Reputation: 5
I am looking to remove rows from a csv file if they contain specific strings or in their row.
I'd like to be able to create a new output file versus overwriting the original.
I need to remove any rows that contain "py-board" or "coffee"
Example Input:
173.20.1.1,2-base
174.28.2.2,2-game
174.27.3.109,xyz-b13-coffee-2
174.28.32.8,2-play
175.31.4.4,xyz-102-o1-py-board
176.32.3.129,xyz-b2-coffee-1
177.18.2.8,six-jump-walk
Expected Output:
173.20.1.1,2-base
174.28.2.2,2-game
174.28.32.8,2-play
177.18.2.8,six-jump-walk
I tried this Deleting rows with Python in a CSV file
import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[1] != "py-board" or if row[1] != "coffee":
writer.writerow(row)
and I tried this
import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[1] != "py-board":
if row[1] != "coffee":
writer.writerow(row)
and this
if row[1][-8:] != "py-board":
if row[1][-8:] != "coffee-1":
if row[1][-8:] != "coffee-2":
but got this error
File "C:\testing\syslogyamlclean.py", line 6, in <module>
for row in csv.reader(inp):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Upvotes: 0
Views: 4245
Reputation: 562
I would actually not use the csv
package for this goal. This can be achieved easily using standard file reading and writing.
Try this code (I have written some comments to make it self-explanatory):
# We open the source file and get its lines
with open('input_csv_file.csv', 'r') as inp:
lines = inp.readlines()
# We open the target file in write-mode
with open('purged_csv_file.csv', 'w') as out:
# We go line by line writing in the target file
# if the original line does not include the
# strings 'py-board' or 'coffee'
for line in lines:
if not 'py-board' in line and not 'coffee' in line:
out.write(line)
Upvotes: 2
Reputation: 2223
# pandas helps to read and manipulate .csv file
import pandas as pd
# read .csv file
df = pd.read_csv('input_csv_file.csv', sep=',', header=None)
df
0 1
0 173.20.1.1 2-base
1 174.28.2.2 2-game
2 174.27.3.109 xyz-b13-coffee-2
3 174.28.32.8 2-play
4 175.31.4.4 xyz-102-o1-py-board
5 176.32.3.129 xyz-b2-coffee-1
6 177.18.2.8 six-jump-walk
# filter rows
result = df[np.logical_not(df[1].str.contains('py-board') | df[1].str.contains('coffee'))]
print(result)
0 1
0 173.20.1.1 2-base
1 174.28.2.2 2-game
3 174.28.32.8 2-play
6 177.18.2.8 six-jump-walk
# save to result.csv file
result.to_csv('result.csv', index=False, header=False)
Upvotes: 0