Reputation: 3737
I have taken a dataframe with multiple rows per reference (because of a score), and have taken those scores and created a new dataframe which is one line per reference and the scores in a series. This is all working.
However, when I try to add the series dataframe back to the old dataframe, it is working, but is showing 'nan' for every cell.
import pandas as pd
df = pd.read_csv('GAPData.csv', encoding='cp1252')
df1 = df[['GrantRefNumber','Academic_Reviews']].copy()
dfGroup1 = df1.groupby('GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))
dfGroup1 produces this...
GrantRefNumber
D/G001118 [5, 6, 6]
D/P041236 [5, 2, 6]
D/P753396 [2, 2, 6, 5]
D/P043434 [2, 5, 4, 5]
D/P034285 [4, 4, 6, 3]
Then I run:
df['Academic Reviews'] = dfGroup1.groupby(
'GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))
df.drop_duplicates('GrantRefNumber', inplace=True)
Which all compiles, but I am left with 'nan' for every cell under the new column 'Academic reviews', not the series.
Any advice on what I am doing wrong?
Upvotes: 1
Views: 53
Reputation: 863116
There is problem your indexes are no aligned, so get all NaN
s in column.
I think you need set_index
with original df
and then join
:
df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic Reviews')).reset_index()
Sample:
df = pd.DataFrame({'GrantRefNumber':['D/G001118'] * 3 + ['D/P041236'] * 3 + ['D/P753396'] * 4,
'Academic_Reviews':[5, 6, 6, 5, 2, 6, 2, 2, 6, 5],
'another_data':[7, 8, 1, 8, 2, 8, 2, 0, 1, 5]})
print (df)
Academic_Reviews GrantRefNumber another_data
0 5 D/G001118 7
1 6 D/G001118 8
2 6 D/G001118 1
3 5 D/P041236 8
4 2 D/P041236 2
5 6 D/P041236 8
6 2 D/P753396 2
7 2 D/P753396 0
8 6 D/P753396 1
9 5 D/P753396 5
dfGroup1 = df.groupby('GrantRefNumber')['Academic_Reviews'].apply(list)
print (dfGroup1)
GrantRefNumber
D/G001118 [5, 6, 6]
D/P041236 [5, 2, 6]
D/P753396 [2, 2, 6, 5]
Name: Academic_Reviews, dtype: object
df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic_Reviews_list')).reset_index()
print (df)
GrantRefNumber Academic_Reviews another_data Academic_Reviews_list
0 D/G001118 5 7 [5, 6, 6]
1 D/G001118 6 8 [5, 6, 6]
2 D/G001118 6 1 [5, 6, 6]
3 D/P041236 5 8 [5, 2, 6]
4 D/P041236 2 2 [5, 2, 6]
5 D/P041236 6 8 [5, 2, 6]
6 D/P753396 2 2 [2, 2, 6, 5]
7 D/P753396 2 0 [2, 2, 6, 5]
8 D/P753396 6 1 [2, 2, 6, 5]
9 D/P753396 5 5 [2, 2, 6, 5]
Upvotes: 1