Reputation: 124
I have the following csv file:
NAME DETAILS
abc type1: Y, Property: p1,p3 , type2:N
def type1: Y, Property: p2,p3 , type2:N
ghi type1: N, Property: p1,p2 , type2:Y
jkl type1: N, Property: p1,p3 , type2:Y
I want to have the ouput file as:
NAME type1 Property type2
abc Y p1,p3 N
def Y p2,p3 N
ghi N p1,p2 Y
jkl N p1,p3 Y
Using python and regular expressions, If I split Details column based on ',' the property type splits into separate columns. Is there a way I could deal with this situation?
Upvotes: 0
Views: 125
Reputation: 5308
A sample script that uses regex and group capture to extract data
script.py
#!/usr/bin/env python
import re,sys
def main():
p = re.compile("([a-z]+).*type1:\s+([A-Z]),\s+Property:\s+?([a-z0-9,]+)\s+,\s+?type2:([A-Z])")
for line in sys.stdin:
m = p.match(line)
if m:
print "\t".join([m.group(1), m.group(2), m.group(3), m.group(4)])
if __name__ == "__main__":
main()
Upvotes: 0
Reputation: 25371
There are many way to do this, but I would split each line on any punctuation/whitespace character, and then reconstruct it manually based on what you desire:
import re
t = """abc type1: Y, Property: p1,p3 , type2:N
def type1: Y, Property: p2,p3 , type2:N
ghi type1: N, Property: p1,p2 , type2:Y
jkl type1: N, Property: p1,p3 , type2:Y""".split('\n')
for x in t:
y = re.findall(r"[\w']+", x)
#print y
print '\t'.join((y[0],y[2],y[4]+','+y[5],y[7]))
> abc Y p1,p3 N
> def Y p2,p3 N
> ghi N p1,p2 Y
> jkl N p1,p3 Y
Another way without regex would be to replace all delimiting characters and then reconstruct automatically. Something like this:
print [x.replace(':','\t').replace(' , ','\t').split() for x in t]
Upvotes: 1