Reputation: 420
mean chart:
interval gross(mean)
(1920, 1925] NaN
(1925, 1930] 3.443000e+06
(1930, 1935] 4.746000e+05
(1935, 1940] 2.011249e+06
i have a huge dataframe(df) which has some Nan values in gross columns Now i want to fill those Nan values from mean chart according to respective interval.
df:
name gross interval
k 1000 (1935, 1940]
l Nan (1950, 1955]
,,,
here interval is categorical index.
Upvotes: 0
Views: 281
Reputation: 862511
You can create new Series
by map
and then replace NaN
s by combine_first
.
Main advantage is no necessary helper column, which is necessary remove later.
df1=pd.DataFrame({'gross(mean)':[np.nan,3.443000e+06, 4.746000e+05, 2.011249e+06, 10,20,30],
'interval':[1922,1927,1932, 1938,1932,1938,1953]})
df1['interval'] = pd.cut(df1['interval'], bins=[1920,1925,1930,1935,1940,1945,1950,1955])
print (df1)
gross(mean) interval
0 NaN (1920, 1925]
1 3443000.0 (1925, 1930]
2 474600.0 (1930, 1935]
3 2011249.0 (1935, 1940]
4 10.0 (1930, 1935]
5 20.0 (1935, 1940]
6 30.0 (1950, 1955]
df = pd.DataFrame({'name':['k','l'],
'gross':[1000, np.nan],
'interval':[1938, 1952]}, columns=['name','gross','interval'])
df['interval'] = pd.cut(df['interval'], bins=[1925,1930,1935,1940,1945,1950,1955])
print (df)
name gross interval
0 k 1000.0 (1935, 1940]
1 l NaN (1950, 1955]
mapped = df['interval'].map(df1.set_index('interval')['gross(mean)'].to_dict())
print (mapped)
0 20.0
1 30.0
Name: interval, dtype: float64
df['gross'] = df['gross'].combine_first(mapped)
print (df)
name gross interval
0 k 1000.0 (1935, 1940]
1 l 30.0 (1950, 1955]
Upvotes: 1
Reputation: 7967
You can add a column to the dataframe with the corresponding mean value using your mean chart (you can do a left join using pd.merge
by joining on the interval
column). Once you have this column, you can use -
df['gross'].fillna(df['means'])
Upvotes: 1