Reputation: 9444
I am trying to understand the difference between these two statements
dataframe['newColumn'] = 'stringconst'
and
for x in y:
if x=="value":
csv = pd.read_csv(StringIO(table), header=None, names=None)
dataframe['newColumn'] = csv[0]
In the first case pandas populates all the rows with the constant value, but in the second case it populates only the first row and assigns NaN to rest of the rows. Why is this? How can I assign the value in the second case to all the rows in the dataframe?
Upvotes: 0
Views: 40
Reputation: 95908
Because csv[0]
is not a scalar value. It's a pd.Series
, and when you do assignment with pd.Series
it tries to align by index (the whole point of pandas
), and probably it's getting NAN
everywhere except the first row because only the first-row's index aligns with the pd.DataFrame
index. So, consider two data-frames (note, they are copies except for the index, which is shifted by 20):
>>> df
0 1 2 3 4
0 4 -5 -1 0 3
1 -2 -2 1 3 4
2 1 2 4 4 -4
3 -5 2 -3 -5 1
4 -5 -3 1 1 -1
5 -4 0 4 -3 -4
6 -2 -5 -3 1 0
7 4 0 0 -4 -4
8 -4 4 -2 -5 4
9 1 -2 4 3 0
>>> df2
0 1 2 3 4
20 4 -5 -1 0 3
21 -2 -2 1 3 4
22 1 2 4 4 -4
23 -5 2 -3 -5 1
24 -5 -3 1 1 -1
25 -4 0 4 -3 -4
26 -2 -5 -3 1 0
27 4 0 0 -4 -4
28 -4 4 -2 -5 4
29 1 -2 4 3 0
>>> df['new'] = df[1]
>>> df
0 1 2 3 4 new
0 4 -5 -1 0 3 -5
1 -2 -2 1 3 4 -2
2 1 2 4 4 -4 2
3 -5 2 -3 -5 1 2
4 -5 -3 1 1 -1 -3
5 -4 0 4 -3 -4 0
6 -2 -5 -3 1 0 -5
7 4 0 0 -4 -4 0
8 -4 4 -2 -5 4 4
9 1 -2 4 3 0 -2
>>> df['new2'] = df2[1]
>>> df
0 1 2 3 4 new new2
0 4 -5 -1 0 3 -5 NaN
1 -2 -2 1 3 4 -2 NaN
2 1 2 4 4 -4 2 NaN
3 -5 2 -3 -5 1 2 NaN
4 -5 -3 1 1 -1 -3 NaN
5 -4 0 4 -3 -4 0 NaN
6 -2 -5 -3 1 0 -5 NaN
7 4 0 0 -4 -4 0 NaN
8 -4 4 -2 -5 4 4 NaN
9 1 -2 4 3 0 -2 NaN
So, one thing you can do to assign the whole column is to simply assign the values:
>>> df
0 1 2 3 4 new new2
0 4 -5 -1 0 3 -5 NaN
1 -2 -2 1 3 4 -2 NaN
2 1 2 4 4 -4 2 NaN
3 -5 2 -3 -5 1 2 NaN
4 -5 -3 1 1 -1 -3 NaN
5 -4 0 4 -3 -4 0 NaN
6 -2 -5 -3 1 0 -5 NaN
7 4 0 0 -4 -4 0 NaN
8 -4 4 -2 -5 4 4 NaN
9 1 -2 4 3 0 -2 NaN
>>> df['new2'] = df2[1].values
>>> df
0 1 2 3 4 new new2
0 4 -5 -1 0 3 -5 -5
1 -2 -2 1 3 4 -2 -2
2 1 2 4 4 -4 2 2
3 -5 2 -3 -5 1 2 2
4 -5 -3 1 1 -1 -3 -3
5 -4 0 4 -3 -4 0 0
6 -2 -5 -3 1 0 -5 -5
7 4 0 0 -4 -4 0 0
8 -4 4 -2 -5 4 4 4
9 1 -2 4 3 0 -2 -2
Or, if you want to assign the first value in the first column, then actually select the first value using iloc
or another selector and then do assignment:
>>> df
0 1 2 3 4 new new2
0 4 -5 -1 0 3 -5 -5
1 -2 -2 1 3 4 -2 -2
2 1 2 4 4 -4 2 2
3 -5 2 -3 -5 1 2 2
4 -5 -3 1 1 -1 -3 -3
5 -4 0 4 -3 -4 0 0
6 -2 -5 -3 1 0 -5 -5
7 4 0 0 -4 -4 0 0
8 -4 4 -2 -5 4 4 4
9 1 -2 4 3 0 -2 -2
>>> df['newest'] = df2.iloc[0,0]
>>> df
0 1 2 3 4 new new2 newest
0 4 -5 -1 0 3 -5 -5 4
1 -2 -2 1 3 4 -2 -2 4
2 1 2 4 4 -4 2 2 4
3 -5 2 -3 -5 1 2 2 4
4 -5 -3 1 1 -1 -3 -3 4
5 -4 0 4 -3 -4 0 0 4
6 -2 -5 -3 1 0 -5 -5 4
7 4 0 0 -4 -4 0 0 4
8 -4 4 -2 -5 4 4 4 4
9 1 -2 4 3 0 -2 -2 4
Upvotes: 2