user40780
user40780

Reputation: 1920

How do I set tuple value to pandas dataframe?

What i want to do should be very simple. Essentially, I have some dataframe, I need assign some tuple value to some column.

for example:

pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = ("a",2)

I just need a new column with tuple value, what should i do?

ValueError: Length of values does not match length of index

The previous code gets the error.

Upvotes: 14

Views: 24130

Answers (5)

Alexey K.
Alexey K.

Reputation: 111

As mentioned the trick is to put a tuple inside of a list [('a', 2)] for each value
and multiply to the number of rows or use apply/lambda
Here are some extra related cases:

If only one digit in a tuple add comma:

pd.DataFrame({'no_comma': [(1.9)], 'with_comma': [(1.9,)]})

To put a tuple into index:

size = 3  
pd.DataFrame(np.random.rand(3,size), [('a',2)]*size )

Upvotes: 1

Stefano Paoli
Stefano Paoli

Reputation: 71

I was looking for something similar, but in my case I wanted the tuple to be a combination of the existing columns, not just a fixed value. I found the solution below, which I share hoping it will be useful to others, like me.

In [24]: df
Out[24]:
      A     B
0     1     2
1    11    22
2   111   222
3  1111  2222

In [25]: df['D'] = df[['A','B']].apply(tuple, axis=1)

In [26]: df
Out[26]:
      A     B             D
0     1     2        (1, 2)
1    11    22      (11, 22)
2   111   222    (111, 222)
3  1111  2222  (1111, 2222)

Upvotes: 6

piRSquared
piRSquared

Reputation: 294298

You can use apply with a lambda that returns the tuple

pd_tmp.assign(newc_olumn=pd_tmp.apply(lambda x: ('a', 2), 1))

          0         1         2 newc_olumn
0  0.373564  0.806956  0.106911     (a, 2)
1  0.332508  0.711735  0.230347     (a, 2)
2  0.516232  0.343266  0.813759     (a, 2)

Upvotes: 4

Shihe Zhang
Shihe Zhang

Reputation: 2771

The doc of series.

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

>>> s = pd.Series(data, index=index)

Here, data can be many different things:

  • a Python dict
  • an ndarray
  • a scalar value (like 5)

So Series won't take tuple type directly.
@Psidom's answer is to make the tuple as the element of a ndarray.

If you are asking about how to set a cell of Series/Dataframe that's an asked question.

Upvotes: 2

akuiper
akuiper

Reputation: 214957

You can wrap the tuples in a list:

import pandas as pd
pd_tmp = pd.DataFrame(np.random.rand(3,3))
pd_tmp["new_column"] = [("a",2)] * len(pd_tmp)

pd_tmp
#          0           1           2    new_column
#0  0.835350    0.338516    0.914184    (a, 2)
#1  0.007327    0.418952    0.741958    (a, 2)
#2  0.758607    0.464525    0.400847    (a, 2)

Upvotes: 22

Related Questions