Reputation: 43
I have a file(tests.txt) containing data in the below Format:
NUMBER,\tFilename,\t\t\t\t\tTestName,\t\t\t\tConfig
001,\t\tFile1.csv,\t\tcube,\t\twidth height size
002,\t\tFile2.csv,\t\tsquare,\t\tlength param
Normally looks like:
HLM_TIER, Filename, TestName, Config
001, File1.csv, cube, width height size
002, File2.csv, square, length param
I want to extract a particular column(TestName) from this file.
import pandas as pd
data = pd.read_csv('tests.txt', skipinitialspace=True)
TestName = data.TestName
TestName = TestName.strip(' \t')
Traceback (most recent call last):
File "C:\Users\temp.py", line 23, in <module>
TestName = data.TestName
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2246, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'TestName'
I think that the error is due to the tabs in the column causing pandas to read the column as "\t\t\t\t\tTestName \" But, i am not sure, how to resolve the issue. NOTE: I cannot change the "tests.txt" file.
Upvotes: 3
Views: 655
Reputation: 1075
Can you just remove all the tabs:
from StringIO import StringIO
with open('test.txt', 'r') as f:
df = pd.read_csv(StringIO(f.read().replace('\t', '')))
df.TestName
Upvotes: 1
Reputation: 90929
You can use converters
to strip the data as you read them in. For this you would need to create a function that does this stripping, and then you would need to pass that into a dict mapping the column to the function.
And you should also manually specify the column names using names
argument and skip the header row.
Example -
def strip(x):
try:
return x.strip()
except AttributeError:
return x
col_names = ['HLM_TIER', 'Filename', 'TestName', 'Config', ...]
col_mapping = {key:strip for key in col_names}
data = pd.read_csv('tests.txt', names=col_names, converters=col_mapping)
Upvotes: 3