Fox
Fox

Reputation: 9444

Adding a column in pandas using a variable

I am trying to understand the difference between these two statements

dataframe['newColumn'] = 'stringconst'

and

for x in y:
   if x=="value":
      csv = pd.read_csv(StringIO(table), header=None, names=None)
      dataframe['newColumn'] = csv[0]

In the first case pandas populates all the rows with the constant value, but in the second case it populates only the first row and assigns NaN to rest of the rows. Why is this? How can I assign the value in the second case to all the rows in the dataframe?

Upvotes: 0

Views: 40

Answers (1)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95908

Because csv[0] is not a scalar value. It's a pd.Series, and when you do assignment with pd.Series it tries to align by index (the whole point of pandas), and probably it's getting NAN everywhere except the first row because only the first-row's index aligns with the pd.DataFrame index. So, consider two data-frames (note, they are copies except for the index, which is shifted by 20):

>>> df
   0  1  2  3  4
0  4 -5 -1  0  3
1 -2 -2  1  3  4
2  1  2  4  4 -4
3 -5  2 -3 -5  1
4 -5 -3  1  1 -1
5 -4  0  4 -3 -4
6 -2 -5 -3  1  0
7  4  0  0 -4 -4
8 -4  4 -2 -5  4
9  1 -2  4  3  0
>>> df2
    0  1  2  3  4
20  4 -5 -1  0  3
21 -2 -2  1  3  4
22  1  2  4  4 -4
23 -5  2 -3 -5  1
24 -5 -3  1  1 -1
25 -4  0  4 -3 -4
26 -2 -5 -3  1  0
27  4  0  0 -4 -4
28 -4  4 -2 -5  4
29  1 -2  4  3  0
>>> df['new'] = df[1]
>>> df
   0  1  2  3  4  new
0  4 -5 -1  0  3   -5
1 -2 -2  1  3  4   -2
2  1  2  4  4 -4    2
3 -5  2 -3 -5  1    2
4 -5 -3  1  1 -1   -3
5 -4  0  4 -3 -4    0
6 -2 -5 -3  1  0   -5
7  4  0  0 -4 -4    0
8 -4  4 -2 -5  4    4
9  1 -2  4  3  0   -2
>>> df['new2'] = df2[1]
>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5   NaN
1 -2 -2  1  3  4   -2   NaN
2  1  2  4  4 -4    2   NaN
3 -5  2 -3 -5  1    2   NaN
4 -5 -3  1  1 -1   -3   NaN
5 -4  0  4 -3 -4    0   NaN
6 -2 -5 -3  1  0   -5   NaN
7  4  0  0 -4 -4    0   NaN
8 -4  4 -2 -5  4    4   NaN
9  1 -2  4  3  0   -2   NaN

So, one thing you can do to assign the whole column is to simply assign the values:

>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5   NaN
1 -2 -2  1  3  4   -2   NaN
2  1  2  4  4 -4    2   NaN
3 -5  2 -3 -5  1    2   NaN
4 -5 -3  1  1 -1   -3   NaN
5 -4  0  4 -3 -4    0   NaN
6 -2 -5 -3  1  0   -5   NaN
7  4  0  0 -4 -4    0   NaN
8 -4  4 -2 -5  4    4   NaN
9  1 -2  4  3  0   -2   NaN
>>> df['new2'] = df2[1].values
>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5    -5
1 -2 -2  1  3  4   -2    -2
2  1  2  4  4 -4    2     2
3 -5  2 -3 -5  1    2     2
4 -5 -3  1  1 -1   -3    -3
5 -4  0  4 -3 -4    0     0
6 -2 -5 -3  1  0   -5    -5
7  4  0  0 -4 -4    0     0
8 -4  4 -2 -5  4    4     4
9  1 -2  4  3  0   -2    -2

Or, if you want to assign the first value in the first column, then actually select the first value using iloc or another selector and then do assignment:

>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5    -5
1 -2 -2  1  3  4   -2    -2
2  1  2  4  4 -4    2     2
3 -5  2 -3 -5  1    2     2
4 -5 -3  1  1 -1   -3    -3
5 -4  0  4 -3 -4    0     0
6 -2 -5 -3  1  0   -5    -5
7  4  0  0 -4 -4    0     0
8 -4  4 -2 -5  4    4     4
9  1 -2  4  3  0   -2    -2
>>> df['newest'] = df2.iloc[0,0]
>>> df
   0  1  2  3  4  new  new2  newest
0  4 -5 -1  0  3   -5    -5       4
1 -2 -2  1  3  4   -2    -2       4
2  1  2  4  4 -4    2     2       4
3 -5  2 -3 -5  1    2     2       4
4 -5 -3  1  1 -1   -3    -3       4
5 -4  0  4 -3 -4    0     0       4
6 -2 -5 -3  1  0   -5    -5       4
7  4  0  0 -4 -4    0     0       4
8 -4  4 -2 -5  4    4     4       4
9  1 -2  4  3  0   -2    -2       4

Upvotes: 2

Related Questions