Reputation: 1604
import pandas as pd
DATA = pd.read_csv(url)
DATA.head()
I have a large dataset that have dozens of columns. After loading it like above into Colab, I can see the name of each column. But running DATA.columns
just return Index([], dtype='object')
. What's happening in this?
Now I find it impossible to pick out a few columns without column names. One way is to specify names = [...]
when I load it, but I'm reluctant to do that since there're too many columns. So I'm looking for a way to index a column by integers, like in R df[:,[1,2,3]]
would simply give me the first three columns of a dataframe. Somehow Pandas seems to focus on column names and makes integer indexing very inconvenient, though.
So what I'm asking is (1) What did I do wrong? Can I obtain those column names as well when I load the dataframe? (2) If not, how can I pick out the [0, 1, 10]
th column by a list of integers?
It seems that the problem is in the loading as DATA.shape
returns (10000,0)
. I rerun the loading code a few times, and all of a sudden, things go back normal. Maybe Colab was taking a nap or something?
Upvotes: 0
Views: 294
Reputation: 233
You can perfectly do that using df.loc[:,[1,2,3]]
but i would suggest you to use the names because if the columns ever change the order or you insert new columns, the code can break it.
Upvotes: 1