Dani Valverde
Dani Valverde

Reputation: 409

Adding a column to pandas data frame fills it with NA

I have this pandas dataframe:

          SourceDomain                           1  2         3
0  www.theguardian.com     profile.theguardian.com  1  Directed
1  www.theguardian.com  membership.theguardian.com  2  Directed
2  www.theguardian.com   subscribe.theguardian.com  3  Directed
3  www.theguardian.com            www.google.co.uk  4  Directed
4  www.theguardian.com        jobs.theguardian.com  5  Directed

I would like to add a new column which is a pandas series created like this:

Weights = Weights.value_counts()

However, when I try to add the new column using edgesFile[4] = Weights it fills it with NA instead of the values:

          SourceDomain                           1  2         3   4
0  www.theguardian.com     profile.theguardian.com  1  Directed NaN
1  www.theguardian.com  membership.theguardian.com  2  Directed NaN
2  www.theguardian.com   subscribe.theguardian.com  3  Directed NaN
3  www.theguardian.com            www.google.co.uk  4  Directed NaN
4  www.theguardian.com        jobs.theguardian.com  5  Directed NaN

How can I add the new column keeping the values? Thanks?

Dani

Upvotes: 1

Views: 1663

Answers (2)

Shahriar
Shahriar

Reputation: 13804

This is small example of your question:

You can add new column with a column name in existing DataFrame

>>> df = DataFrame([[1,2,3],[4,5,6]], columns = ['A', 'B', 'C'])
>>> df
   A  B  C
0  1  2  3
1  4  5  6

>>> s = Series([7,8])
>>> s
0    7
1    8
2    9

>>> df['D']=s
>>> df
   A  B  C  D
0  1  2  3  7
1  4  5  6  8

Or, You can make DataFrame from Series and concat then

>>> df = DataFrame([[1,2,3],[4,5,6]])
>>> df
   0  1  2
0  1  2  3
1  4  5  6

>>> s = DataFrame(Series([7,8]), columns=['4']) # if you don't provide column name, default name will be 0
>>> s
   0
0  7
1  8

>>> df = pd.concat([df,s], axis=1)
>>> df
   0  1  2  0
0  1  2  3  7
1  4  5  6  8

Hope this will help

Upvotes: 0

unutbu
unutbu

Reputation: 879471

You are getting NaNs because the index of Weights does not match up with the index of edgesFile. If you want Pandas to ignore Weights.index and just paste the values in order then pass the underlying NumPy array instead:

edgesFile[4] = Weights.values

Here is an example which demonstrates the difference:

In [14]: df = pd.DataFrame(np.arange(4)*10, index=list('ABCD'))

In [15]: df
Out[15]: 
    0
A   0
B  10
C  20
D  30

In [16]: s = pd.Series(np.arange(4), index=list('CDEF'))

In [17]: s
Out[17]: 
C    0
D    1
E    2
F    3
dtype: int64

Here we see Pandas aligning the index:

In [18]: df[4] = s

In [19]: df
Out[19]: 
    0   4
A   0 NaN
B  10 NaN
C  20   0
D  30   1

Here, Pandas simply pastes the values in s into the column:

In [20]: df[4] = s.values

In [21]: df
Out[21]: 
    0  4
A   0  0
B  10  1
C  20  2
D  30  3

Upvotes: 2

Related Questions