Reputation: 3855
Let's say I have a dataframe df
and I would like to create a new column filled with 0, I use:
df['new_col'] = 0
This far, no problem. But if the value I want to use is a list, it doesn't work:
df['new_col'] = my_list
ValueError: Length of values does not match length of index
I understand why this doesn't work (pandas is trying to assign one value of the list per cell of the column), but how can we avoid this behavior? (if it isn't clear I would like every cell of my new column to contain the same predefined list)
Note: I also tried: df.assign(new_col = my_list)
, same problem
Upvotes: 33
Views: 56639
Reputation: 133
You can use DataFrame.apply
:
In [1]:
df = pd.DataFrame([1, 2, 3], columns=['numbers'])
my_list = ['foo', 'bar']
df['lists'] = df.apply(lambda _: my_list, axis=1)
df
Out[1]:
numbers lists
0 1 [foo, bar]
1 2 [foo, bar]
2 3 [foo, bar]
Again, be aware that my_list
is mutable and shared across the whole dataframe. To avoid that you can make a copy for each row:
df['lists'] = df.apply(lambda _: my_list.copy(), axis=1)
Upvotes: 2
Reputation: 34016
Note that the accepted answer may lead to surprising behavior if you want to modify those lists:
df = pd.DataFrame([1, 2, 3], columns=['a'])
df['lists'] = [[]]* len(df)
df
a lists
0 1 []
1 2 []
2 3 []
df.loc[df.a == 1, 'lists'][0].append('1')
df
a lists
0 1 [1]
1 2 [1]
2 3 [1]
# oops
To avoid this you must initialize the lists
column with a different list instance per row:
df['lists'] = [[] for r in range(len(df))] # note you can't use a generator
df.loc[df.a == 1, 'lists'][0].append('1')
df
a lists
0 1 [1]
1 2 []
2 3 []
Don't be fooled by the display there, that 1 is still a string:
df.loc[df.a == 1, 'lists'][0]
['1']
Upvotes: 14
Reputation: 393943
You'd have to do:
df['new_col'] = [my_list] * len(df)
Example:
In [13]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[13]:
a b c
0 -0.010414 1.859791 0.184692
1 -0.818050 -0.287306 -1.390080
2 -0.054434 0.106212 1.542137
3 -0.226433 0.390355 0.437592
4 -0.204653 -2.388690 0.106218
In [17]:
df['b'] = [[234]] * len(df)
df
Out[17]:
a b c
0 -0.010414 [234] 0.184692
1 -0.818050 [234] -1.390080
2 -0.054434 [234] 1.542137
3 -0.226433 [234] 0.437592
4 -0.204653 [234] 0.106218
Note that dfs are optimised for scalar values, storing non scalar values defeats the point in my opinion as filtering, looking up, getting and setting become problematic to the point that it becomes a pain
Upvotes: 28