Reputation: 610
I'm reading a file with numerical values.
data = pd.read_csv('data.dat', sep=' ', header=None)
In the text file, each row end with a space, So pandas wait for a value that is not there and add a "nan" at the end of each row. For example:
2.343 4.234
is read as: [2.343, 4.234, nan]
I can avoid it using , usecols = [0 1]
but I would prefer a more general solution
Upvotes: 4
Views: 5319
Reputation: 1892
You can simply use:
data = pd.read_csv('data.dat', sep=' ', header=None,
index_col=False # < fixes file with delimiters at the end of each line
)
From pandas documentation.
Note:
index_col=False
can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.
This should work regardless of what type of delimiter you have (e.g., whitespaces, tabs, commas, etc)
Upvotes: 1
Reputation: 131
You can use regular expressions in your sep
argument.
Instead of specifying the separator to be one space, you can ask it to use as a separator any number of spaces until it finds the next value. You can do this by using the regular expression \s+
:
data = pd.read_csv('data.dat', sep='\s+', header=None)
Upvotes: 9
Reputation: 134
Can you change the seperator in the csv file to be something else than a space? As this might be the reason why each row ends with a nan. If you use:
data = pd.read_csv('data.dat', sep=',', header=None)
For example, this problem might be solved without having to clean the data.
Upvotes: 0
Reputation: 2112
Specifying which columns to read using usecols
will be a cleaner approach or you can drop the column once you have read the data but this comes with an overhead of reading data that you don't need. The generic approach will require you the create a regex parser which will be more time consuming and more messy.
Upvotes: 0