Reputation: 4244
From what I've read, it's easy to add and delete columns from a DataFrame
, but I was wondering if there's already a method to do what I'm trying to achieve, in order to avoid reinventing the wheel.
Suppose I have the DataFrame
x
:
a b c
0 1 5 8
1 2 6 9
2 3 7 10
I want to verify whether the column names correspond solely to the elements contained in a list l
. Case there are less elements in l
than columns in x
, I want the missing columns to be deleted.
For instance, if l = ["a", "b"]
, x
would become:
a c
0 1 8
1 2 9
2 3 10
On the other hand, if there are more elements in l
than columns in x
, I want to create new, correspondingly named columns, with all the values on that column being set as 0.
For instance, if l = ["a", "b", "c", "d"]
, x
would become:
a b c d
0 1 5 8 0
1 2 6 9 0
2 3 7 10 0
I could do a loop to check consistency between column names in x
and elements in l
, but is there anything more efficient than that?
Upvotes: 3
Views: 645
Reputation: 5437
just use (addition of .astype(np.int) thanks to @Bill if needed. Note that this converts the whole dataframe to ints):
df.loc[:, l].fillna(0).astype(np.int)
Case 1:
l = ["a", "b"]
df.loc[:, l].fillna(0).astype(np.int)
a b
0 1 5
1 2 6
2 3 7
Case 2:
l = ["a", "b", "c", "d"]
df.loc[:, l].fillna(0).astype(np.int)
a b c d
0 1 5 8 0
1 2 6 9 0
2 3 7 10 0
Upvotes: 4
Reputation: 38415
Again a function but less complicated,
def df_from_list(df, l):
for i in l:
if i not in df.columns:
df[i]=0
return df[l]
Now call the function
l = ["a", "b","z"]
df_from_list(df, l)
You get
a b z
0 1 5 0
1 2 6 0
2 3 7 0
Upvotes: 1
Reputation: 333
I wrote a simple function that gets what you're looking for. The identification is done using set operations, but then it loops to create the new columns using insert. Perhaps there is a better way to do this one loop?
def func_df(df, l):
# First find intersection
intersect = set(df.columns).intersection(set(l))
df = df.loc[:, intersect]
# Now find list elements not here.
additions = set(l).difference(overlap)
for i in additions:
df.insert(0, i, 0)
return df
df = pd.DataFrame(
[[1, 5, 8],
[2, 6, 9],
[3, 7, 10]], columns=['a', 'b', 'c'])
out = func_df(df, ['a', 'b', 'd', 'k'])
print(out)
k d a b
0 0 0 1 5
1 0 0 2 6
2 0 0 3 7
Upvotes: 1
Reputation: 11613
I think pd.concat might be a way to achieve.
In [47]: import pandas as pd
In [48]: data = {
...: 'a': [1, 2, 3],
...: 'b': [5, 6, 7],
...: 'c': [8, 9, 10]
...: }
In [49]: x = pd.DataFrame(data)
In [50]: x
Out[50]:
a b c
0 1 5 8
1 2 6 9
2 3 7 10
In [51]: l = ["a", "b"]
In [52]: x[l]
Out[52]:
a b
0 1 5
1 2 6
2 3 7
In [53]: l = ["a", "b", "c", "d"]
In [55]: y = pd.DataFrame(columns=l)
In [56]: y
Out[56]:
Empty DataFrame
Columns: [a, b, c, d]
Index: []
In [57]: pd.concat((x, y))
Out[57]:
a b c d
0 1.0 5.0 8.0 NaN
1 2.0 6.0 9.0 NaN
2 3.0 7.0 10.0 NaN
In [58]: pd.concat((x, y)).fillna(0)
Out[58]:
a b c d
0 1.0 5.0 8.0 0
1 2.0 6.0 9.0 0
2 3.0 7.0 10.0 0
Upvotes: 1