Umar Yusuf
Umar Yusuf

Reputation: 984

Python Pandas fill NaN with cell values of above cell incrementally

I have a dataframe as follow...

           Word  Count  Team Sex    Code
0       develop      9   1     M  P45.01
1     Effective      7   NaN   M     NaN
2  professional      8   NaN   M     NaN
3      approach      5   NaN   M     NaN
4        raster     34   NaN   M     NaN
5           Sad     55   NaN   M     NaN
6         water      2   NaN   M     NaN
7          soil      7   NaN   M     NaN
8       farming      9   NaN   M     NaN
9          deep     12   NaN   M     NaN

I want to fill NaN in column "Team" with 1, 2, 3, 4, 5, .... incrementally and same for column "Code" like this: P45.01, P46.01, P47.01, P48.01, .... See the final dataframe below...

NOTE: Team is Number while Code is a String column type

enter image description here

df['Team'].fillna(df['Team'][0]+1)
df.fillna(method='ffill')

Upvotes: 1

Views: 2935

Answers (3)

stucash
stucash

Reputation: 1258

I realised you could also do one of the following as well, given your updated dataframe here:

Setup

import re

def count(n):
    x = 0
    while x < n:
        yield x
        x += 1

def populate(s, step):
    chars = re.split('(\d*)', s)
    number = int(chars[1]) + step
    chars[1] = str(number)
    return ''.join(chars)

number_generator = count(10)
number_generator2 = count(10)

Main

# option 1 for `team` column
df['Team'] = df.index + 1 # just use your indices.

# option 2 for 'team' column
df.loc[:, 'Team'] = df.Team.apply(lambda _: next(number_generator)) # use generator.

# output

           Word  Count  Team Sex    Code
0       develop      9     1   M  P45.01
1     Effective      7     2   M     NaN
2  professional      8     3   M     NaN
3      approach      5     4   M     NaN
4        raster     34     5   M     NaN
5           Sad     55     6   M     NaN
6         water      2     7   M     NaN
7          soil      7     8   M     NaN
8       farming      9     9   M     NaN
9          deep     12    10   M     NaN

# we can use generator again for `Code` column
df.loc[:, 'Code']df.Code.apply(lambda _: populate(df.loc[0, 'Code'], next(number_generator2)))

# output 
           Word  Count  Team Sex    Code
0       develop      9     1   M  P45.01
1     Effective      7     2   M  P46.01
2  professional      8     3   M  P47.01
3      approach      5     4   M  P48.01
4        raster     34     5   M  P49.01
5           Sad     55     6   M  P50.01
6         water      2     7   M  P51.01
7          soil      7     8   M  P52.01
8       farming      9     9   M  P53.01
9          deep     12    10   M  P54.01

One thing we need to be careful is that generator is generating number on the fly and can only be used once; but it does the job.

Upvotes: 0

jezrael
jezrael

Reputation: 862511

I believe you can create range and add it to first values, for Sex use ffill:

r = np.arange(len(df))
df['Team'] = df.loc[0, 'Team'] + r
df['Code'] = df.loc[0, 'Code'] + r
df['Sex'] = df['Sex'].ffill()
print (df)
           Word  Count  Team Sex   Code
0       develop      9   1.0   M  45.01
1     Effective      7   2.0   M  46.01
2  professional      8   3.0   M  47.01
3      approach      5   4.0   M  48.01
4        raster     34   5.0   M  49.01
5           Sad     55   6.0   M  50.01
6         water      2   7.0   M  51.01
7          soil      7   8.0   M  52.01
8       farming      9   9.0   M  53.01
9          deep     12  10.0   M  54.01

More general with broadcasting:

cols = ['Team','Code']
r = np.arange(len(df))
df[cols] = r[:, None] + df.loc[0, cols].values
df['Sex'] = df['Sex'].ffill()
print (df)
           Word  Count  Team Sex   Code
0       develop      9   1.0   M  45.01
1     Effective      7   2.0   M  46.01
2  professional      8   3.0   M  47.01
3      approach      5   4.0   M  48.01
4        raster     34   5.0   M  49.01
5           Sad     55   6.0   M  50.01
6         water      2   7.0   M  51.01
7          soil      7   8.0   M  52.01
8       farming      9   9.0   M  53.01
9          deep     12  10.0   M  54.01

EDIT:

If there is string only before float you can extract it to df1, add range and last add prefix:

r = np.arange(len(df))
df['Team'] = (df.loc[0, 'Team'] + r).astype(int)
df1 = df.loc[[0], 'Code'].str.extract('(\D+)(\d+\.\d+)', expand=False)
print (df1)
   0      1
0  P  45.01

df['Code'] = float(df1.loc[0, 1]) + r
df['Code'] = df1.loc[0, 0] + df['Code'].astype(str)
df['Sex'] = df['Sex'].ffill()
print (df)

           Word  Count  Team Sex    Code
0       develop      9     1   M  P45.01
1     Effective      7     2   M  P46.01
2  professional      8     3   M  P47.01
3      approach      5     4   M  P48.01
4        raster     34     5   M  P49.01
5           Sad     55     6   M  P50.01
6         water      2     7   M  P51.01
7          soil      7     8   M  P52.01
8       farming      9     9   M  P53.01
9          deep     12    10   M  P54.01

EDIT:

Error seems there is no first index value 0 but something else. So is possible use iloc which select by positions:

r = np.arange(len(df))
df['Team'] = (df.iloc[0, df.columns.get_loc('Team')] + r).astype(int)
df1 = df.iloc[[0], df.columns.get_loc('Code')].str.extract('(\D+)(\d+\.\d+)', expand=False)

df['Code'] = float(df1.loc[0, 1]) + r
df['Code'] = df1.loc[0, 0] + df['Code'].astype(str)
df['Sex'] = df['Sex'].ffill()
print (df)

           Word  Count  Team Sex    Code
0       develop      9     1   M  P45.01
1     Effective      7     2   M  P46.01
2  professional      8     3   M  P47.01
3      approach      5     4   M  P48.01
4        raster     34     5   M  P49.01
5           Sad     55     6   M  P50.01
6         water      2     7   M  P51.01
7          soil      7     8   M  P52.01
8       farming      9     9   M  P53.01
9          deep     12    10   M  P54.01

Upvotes: 4

funkyFunk
funkyFunk

Reputation: 69

df.insert(0,'Team',range(1,1+len(df)))

This will work with 1 and 2 existing

or

df=df.assign(Team=[i for i in xrange(len(df))])[['Team']+df.columns.tolist()]

Or ultimately

df=df.reset_index()
df['Team']=df.index+1

Havent tested it but it should work

Upvotes: 1

Related Questions