Reputation: 631
How can I remove a certain char if my value in second column of csv starts with "(" or end with ")", I'm very new to python guys help me to solve this
Example:
0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,
to
0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,
I have this code using DATA INFILE
TRIM(TRAILING ')' FROM TRIM(LEADING '('
How can I apply it here in my code:
with open(fullPath, 'rb') as file:
csv_data = csv.reader(file)
next(csv_data)
Upvotes: 0
Views: 794
Reputation: 4137
A solution using lstrip()
and rstrip()
import csv
new_rows = []
with open('test.csv', 'rt') as file:
csv_data = csv.reader(file, delimiter=',')
for row in csv_data:
new_rows.append([row[0],row[1].lstrip('(').rstrip(')'),row[2]])
print(new_rows) # Outputs ['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0Not Supported', '005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2ansom.Win32.TRX.XXPE50FFF027']
Edit
To save the edit on a new .csv file just add:
with open('test2.csv', 'wt') as file:
writer = csv.writer(file)
for row in new_rows:
writer.writerow(row)
Upvotes: 2
Reputation: 51
This is a great opportunity to learn about regular expressions! Regular expressions are a method for recognising and dealing with patterns in text. Python has a regular expressions package as part of its standard library. I'm going to assume you're using Python 3 for the rest of this answer, where the package is named re
.
The TLDR answer to your question is:
import re
string_without_parens = re.sub(r'(^\()|(\)$)', '', string_maybe_has_parens)
What's going on here, though? the re.sub()
function takes three parameters, a regular expression string (denoted by the leading r
), a string that you want to replace each match with, and the string you want to substitute in. The regular expression here is (^\()|(\)$)
. So what does that mean? Lets take it step by step:
()
represents a capture group, these can be used to get the matches out, but I've used them as a way to group characters we're looking for together. There are two capture groups in this regular expression: (^\()
and (\)$)
.|
character, this represents OR in regular expression language, so it's looking for something that matches either (^\()
or (\)$)
.(^\()
: has two things inside it (well, three really, but we'll get to that). The first is ^
, this is what is called an anchor, this one in particular says, "only look at the start of the string". The second (and third) characters are \(
which says "I want to look for an opening parentheses". Because parentheses are using in regular expressions, we have to use the backslash character to "escape" it.(\)$)
: contains an escaped closing parenthesis \)
and other anchor. This anchor represents the end of the string, in the same way ^
represented the start.re.sub()
function says replace anything that matches this pattern with '' (i.e. nothing).Hope that helps! If you want to play more with regular expressions, you can try out regexr, which helped me wrap my head around them.
Upvotes: 0
Reputation: 3445
Here's one way of doing it, I've replaced the first occurrence and the last occurrence of '(' and ')' from the string. Hope it helps.
s = '''0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,(Java Archive (JAR) 4049-0),Not Supported,
005c41fc0f8580f51644493fcbaa0d2d468312c3,(WIN32 EXE 7-2),Ransom.Win32.TRX.XXPE50FFF027,'''
def last_replace(s, old, new, occurrence):
'''Replaces the last occurence of the character'''
li = s.rsplit(old, occurrence)
return new.join(li)
new_string = [last_replace(line, ')', '', 1).replace('(', '', 1) for line in s.split('\n')]
print(new_string)
Output:
['0023632fa4a860be8bc85ddf39fc19c3c4c2e6fe,Java Archive (JAR) 4049-0,Not Supported,',
'005c41fc0f8580f51644493fcbaa0d2d468312c3,WIN32 EXE 7-2,Ransom.Win32.TRX.XXPE50FFF027,']
PS : I stole the last_replace
function from here
Upvotes: 0