Alex Flanagan
Alex Flanagan

Reputation: 587

Retrieving Unknown Column Names from DataFrame.apply

How I can retrieve column names from a call to DataFrame apply without knowing them in advance?

What I'm trying to do is apply a mapping from column names to functions to arbitrary DataFrames. Those functions might return multiple columns. I would like to end up with a DataFrame that contains the original columns as well as the new ones, the amount and names of which I don't know at build-time.

Other solutions here are Series-based. I'd like to do the whole frame at once, if possible.

What am I missing here? Are the columns coming back from apply lost in destructuring unless I know their names? It looks like assign might be useful, but will likely require a lot of boilerplate.

import pandas as pd

def fxn(col):
    return pd.Series(col * 2, name=col.name+'2')

df = pd.DataFrame({'A': range(0, 10), 'B': range(10, 0, -1)})
print(df)

# [Edit:]
#    A   B
# 0  0  10
# 1  1   9
# 2  2   8
# 3  3   7
# 4  4   6
# 5  5   5
# 6  6   4
# 7  7   3
# 8  8   2
# 9  9   1

df = df.apply(fxn)
print(df)

# [Edit:]
# Observed: columns changed in-place.
#     A   B
# 0   0  20
# 1   2  18
# 2   4  16
# 3   6  14
# 4   8  12
# 5  10  10
# 6  12   8
# 7  14   6
# 8  16   4
# 9  18   2

df[['A2', 'B2']] = df.apply(fxn)
print(df)

# [Edit: I am doubling column values, so missing something, but the question about the column counts stands.]
# Expected: new columns added. How can I do this at runtime without knowing column names?
#     A   B  A2  B2
# 0   0  40   0  80
# 1   4  36   8  72
# 2   8  32  16  64
# 3  12  28  24  56
# 4  16  24  32  48
# 5  20  20  40  40
# 6  24  16  48  32
# 7  28  12  56  24
# 8  32   8  64  16
# 9  36   4  72   8

Upvotes: 2

Views: 1129

Answers (3)

Sabito
Sabito

Reputation: 5075

Answer on the behalf of OP:

This code does what I wanted:

import pandas as pd

# Simulated business logic: for an input row, return a number of columns
# related to the input, and generate names for them, such that we don't
# know the shape of the output or the names of its columns before the call.
def fxn(row):
    length = row[0]
    indicies = [row.index[0] + str(i) for i in range(0, length)]
    series = pd.Series([i for i in range(0, length)], index=indicies)
    return series

# Sample data: 0 to 18, inclusive, counting by 2.
df1 = pd.DataFrame(list(range(0, 20, 2)), columns=['A'])

# Randomize the rows to simulate different input shapes.
df1 = df1.sample(frac=1)

# Apply fxn to rows to get new columns (with expand). Concat to keep inputs.
df1 = pd.concat([df1, df1.apply(fxn, axis=1, result_type='expand')], axis=1)
print(df1)

Upvotes: 0

You can assign directly with:

df[df.columns + '2'] = df.apply(fxn)

Outut:

   A   B  A2  B2
0  0  10   0  20
1  1   9   2  18
2  2   8   4  16
3  3   7   6  14
4  4   6   8  12
5  5   5  10  10
6  6   4  12   8
7  7   3  14   6
8  8   2  16   4
9  9   1  18   2

Alternatively, you can leverage the @MayankPorwal answer by using .add_suffix('2') to the output from your apply function:

pd.concat([df, df.apply(fxn).add_suffix('2')], axis=1)

which will return the same output.


In your function, name=col.name+'2' is doing nothing (it's basically returning just col * 2). That's because apply returns the values back to the original column.

Anyways, it's possible to take the MayankPorwal approach: pd.concat + managing duplicated columns (make them unique). Another possible way to do that:

# Use pd.concat as mentioned in the first answer from Mayank Porwal
df = pd.concat([df, df.apply(fxn)], axis=1)

# Rename duplicated columns
suffix = (pd.Series(df.columns).groupby(df.columns).cumcount()+1).astype(str)
df.columns = df.columns + suffix.rename('1', '')

which returns the same output, and additionally manage further duplicated columns.

Upvotes: 3

Mayank Porwal
Mayank Porwal

Reputation: 34086

You need to concat the result of your function with the original df.

Use pd.concat:

In [8]: x = df.apply(fxn) # Apply function on df and store result separately
In [10]: df = pd.concat([df, x], axis=1) # Concat with original df to get all columns

Rename duplicate column names by adding suffixes:

In [82]: from collections import Counter

In [38]: mylist = df.columns.tolist()
In [41]: d = {a:list(range(1, b+1)) if b>1 else '' for a,b in Counter(mylist).items()}

In [62]: df.columns = [i+str(d[i].pop(0)) if len(d[i]) else i for i in mylist]

In [63]: df
Out[63]: 
   A1  B1  A2  B2
0   0  10   0  20
1   1   9   2  18
2   2   8   4  16
3   3   7   6  14
4   4   6   8  12
5   5   5  10  10
6   6   4  12   8
7   7   3  14   6
8   8   2  16   4
9   9   1  18   2

Upvotes: 4

Related Questions