Reputation: 1920
What i want to do should be very simple. Essentially, I have some dataframe, I need assign some tuple value to some column.
for example:
pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = ("a",2)
I just need a new column with tuple value, what should i do?
ValueError: Length of values does not match length of index
The previous code gets the error.
Upvotes: 14
Views: 24130
Reputation: 111
As mentioned the trick is to put a tuple inside of a list [('a', 2)] for each value
and multiply to the number of rows or use apply/lambda
Here are some extra related cases:
If only one digit in a tuple add comma:
pd.DataFrame({'no_comma': [(1.9)], 'with_comma': [(1.9,)]})
To put a tuple into index:
size = 3
pd.DataFrame(np.random.rand(3,size), [('a',2)]*size )
Upvotes: 1
Reputation: 71
I was looking for something similar, but in my case I wanted the tuple to be a combination of the existing columns, not just a fixed value. I found the solution below, which I share hoping it will be useful to others, like me.
In [24]: df
Out[24]:
A B
0 1 2
1 11 22
2 111 222
3 1111 2222
In [25]: df['D'] = df[['A','B']].apply(tuple, axis=1)
In [26]: df
Out[26]:
A B D
0 1 2 (1, 2)
1 11 22 (11, 22)
2 111 222 (111, 222)
3 1111 2222 (1111, 2222)
Upvotes: 6
Reputation: 294298
You can use apply
with a lambda
that returns the tuple
pd_tmp.assign(newc_olumn=pd_tmp.apply(lambda x: ('a', 2), 1))
0 1 2 newc_olumn
0 0.373564 0.806956 0.106911 (a, 2)
1 0.332508 0.711735 0.230347 (a, 2)
2 0.516232 0.343266 0.813759 (a, 2)
Upvotes: 4
Reputation: 2771
The doc of series
.
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
>>> s = pd.Series(data, index=index)
Here, data can be many different things:
- a Python dict
- an ndarray
- a scalar value (like 5)
So Series
won't take tuple type directly.
@Psidom's answer is to make the tuple as the element of a ndarray
.
If you are asking about how to set a cell of Series/Dataframe that's an asked question.
Upvotes: 2
Reputation: 214957
You can wrap the tuples in a list:
import pandas as pd
pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = [("a",2)] * len(pd_tmp)
pd_tmp
# 0 1 2 new_column
#0 0.835350 0.338516 0.914184 (a, 2)
#1 0.007327 0.418952 0.741958 (a, 2)
#2 0.758607 0.464525 0.400847 (a, 2)
Upvotes: 22