Praful S
Praful S

Reputation: 43

Python: Extract a particular column(containing special characters) from csv file using pandas

I have a file(tests.txt) containing data in the below Format:

NUMBER,\tFilename,\t\t\t\t\tTestName,\t\t\t\tConfig
001,\t\tFile1.csv,\t\tcube,\t\twidth height size
002,\t\tFile2.csv,\t\tsquare,\t\tlength param

Normally looks like:

HLM_TIER,    Filename,                  TestName,               Config
001,         File1.csv,                 cube,                   width height size
002,         File2.csv,                 square,                 length param

I want to extract a particular column(TestName) from this file.

Code tried:

import pandas as pd
data = pd.read_csv('tests.txt', skipinitialspace=True)
TestName = data.TestName
TestName = TestName.strip(' \t')

But, I get the following Error:

Traceback (most recent call last):
  File "C:\Users\temp.py", line 23, in <module>
    TestName = data.TestName
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2246, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'TestName'

I think that the error is due to the tabs in the column causing pandas to read the column as "\t\t\t\t\tTestName \" But, i am not sure, how to resolve the issue. NOTE: I cannot change the "tests.txt" file.

Upvotes: 3

Views: 655

Answers (3)

Sergey Sergienko
Sergey Sergienko

Reputation: 365

df = pd.read_csv('Foo.txt', delim_whitespace=True)

Upvotes: 0

matt_s
matt_s

Reputation: 1075

Can you just remove all the tabs:

from StringIO import StringIO

with open('test.txt', 'r') as f:
    df = pd.read_csv(StringIO(f.read().replace('\t', '')))
    df.TestName

Upvotes: 1

Anand S Kumar
Anand S Kumar

Reputation: 90929

You can use converters to strip the data as you read them in. For this you would need to create a function that does this stripping, and then you would need to pass that into a dict mapping the column to the function.

And you should also manually specify the column names using names argument and skip the header row.

Example -

def strip(x):
    try:
        return x.strip()
    except AttributeError:
        return x

col_names = ['HLM_TIER', 'Filename', 'TestName', 'Config', ...]
col_mapping = {key:strip for key in col_names}
data = pd.read_csv('tests.txt', names=col_names, converters=col_mapping)

Upvotes: 3

Related Questions