theprowler
theprowler

Reputation: 3600

Python - read a file and append it to a Dataframe line by line

Can I read a file with Python, and then directly append data from each line to a Pandas dataframe?

The data I want to parse is contained in the body of an email:

enter image description here

I tried using RegEx to capture the following data:

Species: GB EAST cod, GB blackback, etc

Sector: NEFS 5

Pounds: 954, 30,000, etc

Prices: $0.83, $0.07, etc

and the Date: 09/01/2014

but it proved very difficult to capture all of that...

I can get the Date easily since it'll always appear after Sent: I use RegEx to capture everything after Sent: and then dateutil to capture the date.

The Sector is easy enough too, I just have RegEx search for one of the 20 sectors and if it sees one of them captures it.

But capturing the species, pounds, and price data and making sure they line up correctly, AND putting them into a dataframe neatly is where I am stuck. So my thinking now is to just capture each line in the body of the email and break up what I capture into different columns for the dataframe.

I know that isn't the cleanest capture but I'd rather get too much of the data and just have to delete some manually later than not get enough of it.

So my question is: with Python can I read a file and transfer everything I read into a Pandas dataframe?

Upvotes: 0

Views: 8981

Answers (2)

Jammeth_Q
Jammeth_Q

Reputation: 128

This is an overly-specific function I made for reading the fish section of your email once I put it in a text file. It assumes you've already pulled out the date and the sector.

It might not work exactly for your implementation, but hopefully the use of python string methods will get you in the right direction, and show you how to add it all into a DataFrame.

def fish_to_frame(fish_file, sector, date):
    # Initialize some lists
    species = []
    pounds = []
    prices = []
    date = pd.to_datetime(date, infer_datetime_format=True)
    with open(fish_file) as f:
        for line in f:
            # Fish: weight @ price
            fish, remainder = line.split(':')
            if '@' in remainder:
                weight, price = remainder.split('@')
            if 'trade' in remainder:
                weight, price = remainder.split('to ')
            weight = weight.strip(' lbs')
            species.append(fish)
            pounds.append(weight)
            prices.append(price)
    fish_frame = pd.DataFrame({'Species':species,
                              'Sector':sector,
                              'Pounds':pounds,
                              'Prices':prices,
                              'Date':date})
    return fish_frame

You could do some additional steps in there to convert the weights, etc to numeric, etc as well. Hope this helps!

And an additional step could be combining this with an existing DataFrame with those columns already existing. But adding new entries line by line would be slow(er).

Upvotes: 1

Sudhir Chauhan
Sudhir Chauhan

Reputation: 83

Yes, once you have data in a file, you can use pandas.read_csv('filename.csv'). Check pandas.read_csv for details.

Upvotes: 0

Related Questions