Reputation: 5408
I could not find such an option in the documentation. A measuring device spits out everything in Excel:
<> A B C
1
2
3
When I delete the "<>" characters manually everything works fine. Is there a way to circumvent that (without conversion to csv)?
I do:
import pandas as pd
df = pd.read_excel(filename,sheetname,skiprows=0,header=0,index_col=0)
skiprow = 1
does not do the trick since pandas uses the first row as names. If I supply names = list(range(1, 4))
the first data row is lost.
Upvotes: 3
Views: 10737
Reputation: 430
Expanding on Peruz's answer:-
For your case, using regex
df = pd.read_csv(filename, sep="(?<!<>)\s+", engine='python')
This should read in the columns properly, except that the first column would be named <> A
To change this, simply alter the first column name
df.columns = pd.Series(df.columns.str.replace("<>\s", ""))
In the regex expression, \s+
matches any number of space characters except when preceded by whatever is mentioned in the negative lookaround denoted by (?<!charceters_to_ignore)
Upvotes: 4
Reputation: 433
Pandas read_csv()
supports regex
. You can avoid matching the white space if it is preceded by something (in your case #). Just as an example, avoiding "!":
sep='(?<!\\!)\s+'
if you want you could rename the column to remove the initial character and white space.
cheers
Upvotes: 1
Reputation: 43
I have the same problem. My first line is
# id x y ...
So pandas header keyword doesn't work. I did the following by reading it twice:
cos_phot_header = pd.read_csv(table, delim_whitespace=True, header=None, engine='python', nrows=1)
cos_plot_text_header = cos_phot_header.drop(0, axis=1).values.tolist()
cos_phot_data = pd.read_csv(table, skip_blank_lines=True, comment='#',
delim_whitespace=True, header=None, engine='python', names=cos_plot_text_header[0])
I don't understand why there is no such option in pandas to do this, it is a very common problem that everyone encounters. You can also read the table with no lines (nrows=0) and use .columns, but honestly I think it is an equally ugly solution to the problem.
Upvotes: 1
Reputation: 391
I have the same problem. My first line is:
# id ra dec ...
Where #
is the commenting-character in Python. import_csv
thinks that #
is a column header, but it's not.
The workaround I used was to define the headers manually:
headerlist = ['id', 'ra', 'dec', ...]
df = pd.read_csv('data.txt', index_col=False, header=0, names=headerlist)
Note that index_col
is optional in regards to this problem.
If there is any option to ignore a certain character in header line, I haven't found it. Hope this solution can be improved upon.
Upvotes: 0
Reputation: 1366
Another option would be:
f = open(fname, 'r')
line1 = f.readline()
data1 = pd.read_csv(f, sep='\s+', names=line1.replace(' #', '').split(), dtype=np.float)
You might have a different separator though.
Upvotes: 1