Reputation: 185
Currently, I have to read the CSV file and set the headers in advance. And then drop the columns which I don't want. Is there any way to do this directly?
# Current Code
columns_name = ['station', 'date', 'observation', 'value', 'other_1',
'other_2', 'other_3', 'other_4']
del_columns_name = ['other_1', 'other_2', 'other_3', 'other_4']
df =pd.read_csv('filename', names = columns_name)
df.drop(del_columns_name, axis=1)
Upvotes: 1
Views: 2652
Reputation: 18906
I think you might even specify the indexes right away. In this case you are insterested in: [0,1,2,3]
. Consider this example which also parses dates.
import pandas as pd
cols = ['station', 'date', 'observation', 'value']
data = '''\
1, 2018-01-01, 1, 1, 1, 1, 1, 1
2, 2018-01-02, 2, 2, 2, 2, 2, 2'''
file = pd.compat.StringIO(data)
df = pd.read_csv(file, names=cols, usecols=[0,1,2,3], parse_dates=[1])
print(df)
Returns:
station date observation value
0 1 2018-01-01 1 1
1 2 2018-01-02 2 2
Upvotes: 2
Reputation: 164623
One way is to use your two lists to resolve the indices and column names required.
Then use usecols
and names
arguments for pd.read_csv
to specify column indices and names respectively.
idx, cols = list(zip(*((i, x) for i, x in enumerate(columns_name) \
if x not in del_columns_name)))
df = pd.read_csv('filename', usecols=idx, names=cols, header=None)
As explained in the docs, you should also specify header=None
explicitly when no header exists.
Explanation
columns_name
and remove items not in del_columns_name
.enumerate
to extract indices.zip
to create separate tuples for indices and column names.Upvotes: 2