blackfury
blackfury

Reputation: 685

Extract Data from file using python

Input File:

["abc","on time","date","<a href='link'>11111</a>","time","2","2"],

["abc","on time","date","<a href='link'>11111</a>","time","2","6"],

["abc","on time","date","<a href='link'>11111</a>","time","2","9"],

["abc","on time","date","<a href='link'>11111</a>","time","2","0"],

["abc","on time","date","<a href='link'>11111</a>","time","2","5"]

output to be needed:

abc,on time,date,<a href='link'>11111</a>,time,2,2

abc,on time,date,<a href='link'>11111</a>,time,2,6

abc,on time,date,<a href='link'>11111</a>,time,2,9

abc,on time,date,<a href='link'>11111</a>,time,2,0

abc,on time,date,<a href='link'>11111</a>,time,2,5

Code tried:

import sys
import re

Lines = [Line.strip() for Line in open (sys.argv[1],'r').readlines()]



for EachLine in Lines:
    Parts = EachLine.split(",")
    for EachPart in Parts:

        EachPart = re.sub(r'[', '', EachPart)
        EachPart = re.sub(r']', '', EachPart)
print ' '.join(Parts)

Can anyone help me on this?? I am not getting what i desired. Thanks in advance.

Upvotes: 0

Views: 62

Answers (3)

qwertyuip9
qwertyuip9

Reputation: 1632

Another option without using regex is:

for line in lines:
  formatted = ','.join(line).replace('"', '')
  print(formatted)

Upvotes: 0

Sait
Sait

Reputation: 19805

As already mentioned, you can use eval().

with open('a.txt') as f:
    for line in f:
        line = line.replace(',\n', '\n').strip() # remove if there is `,` at the end
        if line:                                 # to tackle with empty lines
            print(','.join(eval(line.strip())))

Input:

["abc","on time","date","<a href='link'>11111</a>","time","2","2"],

["abc","on time","date","<a href='link'>11111</a>","time","2","6"],

["abc","on time","date","<a href='link'>11111</a>","time","2","9"],

["abc","on time","date","<a href='link'>11111</a>","time","2","0"],

["abc","on time","date","<a href='link'>11111</a>","time","2","5"]

Output:

abc,on time,date,<a href='link'>11111</a>,time,2,2
abc,on time,date,<a href='link'>11111</a>,time,2,6
abc,on time,date,<a href='link'>11111</a>,time,2,9
abc,on time,date,<a href='link'>11111</a>,time,2,0
abc,on time,date,<a href='link'>11111</a>,time,2,5

Upvotes: 0

Azmi Kamis
Azmi Kamis

Reputation: 901

I modified your initial solution to

import sys
import re

Lines = [Line.strip() for Line in open (sys.argv[1],'r').readlines()]

for EachLine in Lines:
    matches = re.findall(r'\"(.+?)\"',EachLine)
    print ','.join(matches)

My approach is to use regex to get all string in double quotes.

Upvotes: 1

Related Questions