user27009853
user27009853

Reputation:

Using pandas `.assign()` to make a column that contains a string scalar

I recently stumbled upon the .assign() dataframe method and love how it can cleanly express creating new columns. It's very intuitive to create columns that are functions of other columns and objects, however, assigning a string scalar returns NaN for the entire column. This makes sense looking at the documentation that the method takes keyword arguments with a callable or series as values, but even when using a lambda to basically wrap the string into a function it returns a column of NaN values.


str_scalar = "Hello"

df = df.assign(str_scalar_col = str_scalar)


# column str_scalar_col is all NaN


df = df.assign(str_scalar_col = lambda x: str_scalar)

# column str_scalar_col is still all NaN

Maybe this has to do with the type of the column created by default?

Normally I would just assign the column inplace, but curious if .assign() can assign a string scalar column.


df['str_scalar'] = "Hello"

# column str_scalar is all "Hello"

Upvotes: 0

Views: 55

Answers (1)

mozway
mozway

Reputation: 260865

df = df.assign(str_scalar_col = lambda: str_scalar) can't work since pandas will pass the DataFrame as parameter. You should get a TypeError.

The correct syntax to use a function/lambda would be:

df.assign(str_scalar_col = lambda x: str_scalar)

   col str_scalar_col
0    0          Hello
1    1          Hello
2    2          Hello
3    3          Hello
4    4          Hello

If you want to create a variable output you can access the items with:

df.assign(str_scalar_col = lambda x: str_scalar + x['col'].astype(str))

   col str_scalar_col
0    0         Hello0
1    1         Hello1
2    2         Hello2
3    3         Hello3
4    4         Hello4

Upvotes: 0

Related Questions