PythonDawg
PythonDawg

Reputation: 5

Remove rows from CSV file containing certain characters

I am looking to remove rows from a csv file if they contain specific strings or in their row.

I'd like to be able to create a new output file versus overwriting the original.

I need to remove any rows that contain "py-board" or "coffee"

Example Input:

173.20.1.1,2-base
174.28.2.2,2-game
174.27.3.109,xyz-b13-coffee-2
174.28.32.8,2-play
175.31.4.4,xyz-102-o1-py-board
176.32.3.129,xyz-b2-coffee-1
177.18.2.8,six-jump-walk

Expected Output:

173.20.1.1,2-base
174.28.2.2,2-game
174.28.32.8,2-play
177.18.2.8,six-jump-walk

I tried this Deleting rows with Python in a CSV file

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board" or if row[1] != "coffee":
            writer.writerow(row)

and I tried this

import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
    writer = csv.writer(out)
    for row in csv.reader(inp):
        if row[1] != "py-board":
            if row[1] != "coffee":
                writer.writerow(row)

and this

        if row[1][-8:] != "py-board":
            if row[1][-8:] != "coffee-1":
                if row[1][-8:] != "coffee-2":

but got this error

  File "C:\testing\syslogyamlclean.py", line 6, in <module>
    for row in csv.reader(inp):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Upvotes: 0

Views: 4245

Answers (2)

The Punisher
The Punisher

Reputation: 562

I would actually not use the csv package for this goal. This can be achieved easily using standard file reading and writing.

Try this code (I have written some comments to make it self-explanatory):

# We open the source file and get its lines
with open('input_csv_file.csv', 'r') as inp:
    lines = inp.readlines()

# We open the target file in write-mode
with open('purged_csv_file.csv', 'w') as out:
    # We go line by line writing in the target file
    # if the original line does not include the
    # strings 'py-board' or 'coffee'
    for line in lines:
        if not 'py-board' in line and not 'coffee' in line:
            out.write(line)

Upvotes: 2

imdevskp
imdevskp

Reputation: 2223

# pandas helps to read and manipulate .csv file
import pandas as pd

# read .csv file
df = pd.read_csv('input_csv_file.csv', sep=',', header=None)
df
              0                    1
0    173.20.1.1               2-base
1    174.28.2.2               2-game
2  174.27.3.109     xyz-b13-coffee-2
3   174.28.32.8               2-play
4    175.31.4.4  xyz-102-o1-py-board
5  176.32.3.129      xyz-b2-coffee-1
6    177.18.2.8        six-jump-walk

# filter rows
result = df[np.logical_not(df[1].str.contains('py-board') | df[1].str.contains('coffee'))]
print(result)
             0              1
0   173.20.1.1         2-base
1   174.28.2.2         2-game
3  174.28.32.8         2-play
6   177.18.2.8  six-jump-walk

# save to result.csv file
result.to_csv('result.csv', index=False, header=False)

Upvotes: 0

Related Questions