Eric
Eric

Reputation: 666

How to set the columns in a dataframe even when one of the columns doesn't exist?

I am compiling a list of dataframes from ReST endpoints (so from json results). In some cases, during the final steps, when I set the final set of columns I receive a KeyError exception.

images_df = pd.concat(images)
images_df = images_df[list(cvpc.images_columns.keys())]

What I would like to know is, is there a way to set the columns in such a way that non-existent columns are simply created with null values?

I've also tried to set the columns before appending to the list of dataframes, i.e.:

temp_df = temp_df[list(cvpc.images_columns.keys())]
images.append(temp_df)

So if I can get the columns to "create" even when they don't exist this would be a huge win as setting columns sooner can help keep the final list of images to a minimal size.

Here's a simple example:

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df_t = pd.DataFrame(data)

final_columns = ['col_1', 'col_2', 'col_3']
df = df_t[final_columns]

Any suggestions would be greatly appreciated.

Upvotes: 2

Views: 2091

Answers (3)

Umar.H
Umar.H

Reputation: 23099

you could create a dictionary and unpack it using assign for non existing columns, then simply slice the columns as you've done above with a list.

import numpy as np
df = df_t.assign(**{col : np.nan for col in final_columns if col not in df_t.columns}
           )[final_columns]

print(df)

   col_1 col_2  col_3
0      3     a    NaN
1      2     b    NaN
2      1     c    NaN
3      0     d    NaN

Upvotes: 1

user2317421
user2317421

Reputation:

You can do something like this:

import numpy as np
import pandas as pd

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df_t = pd.DataFrame(data)

final_columns = ['col_1', 'col_2', 'col_3']

for col in final_columns:
   if col not in df_t.columns:
      df_t[col] = np.NaN

Upvotes: 2

wasif
wasif

Reputation: 15498

I will assign empty columns filled with NaN values:

import numpy as np
import pandas as pd
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
df_t = pd.DataFrame(data)

final_columns = ['col_1', 'col_2', 'col_3']
for x in final_columns:
  if not x in list(df_t.columns.values):
    df_t[x] = np.nan

df = df_t[final_columns]

Later you can fill the NaN columns.

Upvotes: 0

Related Questions