Reputation: 107

How can I create index for python pandas dataframe?

I am importing several csv files into python using Jupyter notebook and pandas and some are created without a proper index column. Instead, the first column, which is data that I need to manipulate is used. How can I create a regular index column as first column? This seems like a trivial matter, but I can't find any useful help anywhere.

What my dataframe looks like

What my dataframe should look like

Upvotes: 5

Answers (5)

KLaz

Reputation: 782

Since you are reading some csvs with indices and some without, and it seems like it is not known in advance which ones have an index and what the name of it is, I would not use index_col in pandas.read_csv(), because setting it as False would ignore the (potentially) existing index and setting it True seems to also not work for the problem, because either the index names are unknown or there is no index. I would also not use directly data.reset_index(inplace=True) like suggested above.

If data is the dataframe, I would start with this check:

if "Unnamed: 0" in data:
        data.drop("Unnamed: 0", axis=1, inplace=True)

because while trying to make this work, this unwanted index column might have been added to the data.

In order to maintain the old indices, I would collect their names with data.index.name and then replace each of them with

data.rename(columns={"indexname1": "raw_index"}, inplace=True)
data.rename(columns={"indexname2": "raw_index"}, inplace=True)
....

for lineage.

Then,

data.reset_index(inplace=True)

would create a new index for each dataframe. If it is preferable to create new indices only for the dataframes that do not have one, then you could do the previous reset_index command for the dfs without index and do data.set_index('indexname1'), data.set_index('indexname2'), etc. for the rest.

However, in order to make data reading more sustainable and less tedious, while maintaining lineage (e.g., old indices), I would strongly suggest to write again all dataframe to files. After having normalized the index with the above steps, and the first column is indeed the (new/old) index from now, then this:

data.to_csv(filepath, index=True)

will make sure that from the next time on, the data can be read with:

data = pd.read_csv(index_col=0)

Thus all data frames will have their first column set as index, and this can make the rest of the code in the project less complex.

Upvotes: 0

Isabely

Reputation: 1

Python 3.8.5

pandas==1.2.4

pd.read_csv('file.csv', header=None)

I found the solution in the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Upvotes: -2

Suman Dey

Reputation: 151

Could you please try this:

df.reset_index(inplace = True, drop = True)

Let me know if this works.

Upvotes: 8

Nicolas Gervais

Reputation: 36704

When you imported your csv, did you use the index_col argument? It should default to None, according to the documentation. If you don't use the argument, you should be fine.

Either way, you can force it not to use a column by using index_col=False. From the docs:

Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.

Upvotes: 0

Adam

Reputation: 367

When you are reading in the csv, use pandas.read_csv(index_col= #, * args). If they don't have a proper index column, set index_col=False.

To change indices of an existing DataFrame df, try the methods df = df.reset_index() or df=df.set_index(#).

Upvotes: 2

How can I create index for python pandas dataframe?

Answers (5)

Related Questions