eWizardII
eWizardII

Reputation: 1936

Python CSV Scraping

I have a CSV file which has data organized as follows:

Name: xyz
DNS:  xyz
Type: xyz
Date: xyz

Name: xyz
DNS:  xyz
Type: xyz
Date: xyz

Name: xyz
DNS:  xyz
Type: xyz
Date: xyz

This continues for many users n.

I'm trying to figure out how do I read this data properly in Python, this doesn't seem like a hard problem just confused on how I read the information since this isn't the usual setup of csv file, it would be easier if it was Name,DNS,etc then I would know how to handle that properly.

I started with something like this:

import csv
r = csv.reader(open("data.csv"))

now doing r.next() would get each thing line by line, but that's not helpful since my plan is to have a counter that checks if the date is greater than a certain time, and the type field matches a certain value add a number of some variable in the loop like a counter.

This is kind of close to what I am doing in the sense of how the data is structured, but I don't think it will help me in my quest:

How can I scrape data from a text table using Python?

Upvotes: 0

Views: 347

Answers (3)

khachik
khachik

Reputation: 28703

As others mentioned, you don't need a CSV reader (sure you can use it but without any benefit). Just read the data file and keep some state for the current section. On blank lines store the current section and reset the state.

Something like this should work:

def load(input):
    data = []
    current = {}
    for line in input:
        # may be useful to strip the line here and forget about
        # leading/trailing whitespaces
        if not line.strip():
            data.append(current)
            current = {}
        # use line.split(':') if `:' does not appear in values
        colon = line.find(':')
        if colon == -1: # unknown format, throw an exception or just ignore it
            continue
        key = line[:colon]
        value = line[colon+2:-1] # or line[colon1].strip() to remove trailing whitespace
        current[key] = value
    return data

import sys

if __name__ == "__main__":
    with(open(sys.argv[1])) as input:
        print load(input)

Upvotes: 1

C3roe
C3roe

Reputation: 96417

You could try to read that data by setting the **fmtparams parameter when calling csv.reader with Dialect.delimiter set to \n and Dialect.lineterminator to \n\n. (Or replacing each \n by \r\n or just \r, depending on the line ending format of your file.)

Then you would get Name: xyz, DNS: xyz etc. as contents of the “columns” of your “csv” file – and you would only have to split those at the colon for further processing …

Upvotes: 1

dm03514
dm03514

Reputation: 55972

That's not a csv file at all. If that is your format you could scan the file until you reach a blank new line, that denotes a section. You could then parse each section accordingly.

I dont' think csv is not going to be any help here

you can just read the file and iterate it line by line

f = open('data.csv')
for line in f:
  pass

Upvotes: 1

Related Questions