Reputation: 984
I have a dataframe as follow...
Word Count Team Sex Code
0 develop 9 1 M P45.01
1 Effective 7 NaN M NaN
2 professional 8 NaN M NaN
3 approach 5 NaN M NaN
4 raster 34 NaN M NaN
5 Sad 55 NaN M NaN
6 water 2 NaN M NaN
7 soil 7 NaN M NaN
8 farming 9 NaN M NaN
9 deep 12 NaN M NaN
I want to fill NaN in column "Team" with 1, 2, 3, 4, 5, ....
incrementally and same for column "Code" like this: P45.01, P46.01, P47.01, P48.01, ...
. See the final dataframe below...
NOTE: Team is Number while Code is a String column type
df['Team'].fillna(df['Team'][0]+1)
df.fillna(method='ffill')
Upvotes: 1
Views: 2935
Reputation: 1258
I realised you could also do one of the following as well, given your updated dataframe
here:
import re
def count(n):
x = 0
while x < n:
yield x
x += 1
def populate(s, step):
chars = re.split('(\d*)', s)
number = int(chars[1]) + step
chars[1] = str(number)
return ''.join(chars)
number_generator = count(10)
number_generator2 = count(10)
# option 1 for `team` column
df['Team'] = df.index + 1 # just use your indices.
# option 2 for 'team' column
df.loc[:, 'Team'] = df.Team.apply(lambda _: next(number_generator)) # use generator.
# output
Word Count Team Sex Code
0 develop 9 1 M P45.01
1 Effective 7 2 M NaN
2 professional 8 3 M NaN
3 approach 5 4 M NaN
4 raster 34 5 M NaN
5 Sad 55 6 M NaN
6 water 2 7 M NaN
7 soil 7 8 M NaN
8 farming 9 9 M NaN
9 deep 12 10 M NaN
# we can use generator again for `Code` column
df.loc[:, 'Code']df.Code.apply(lambda _: populate(df.loc[0, 'Code'], next(number_generator2)))
# output
Word Count Team Sex Code
0 develop 9 1 M P45.01
1 Effective 7 2 M P46.01
2 professional 8 3 M P47.01
3 approach 5 4 M P48.01
4 raster 34 5 M P49.01
5 Sad 55 6 M P50.01
6 water 2 7 M P51.01
7 soil 7 8 M P52.01
8 farming 9 9 M P53.01
9 deep 12 10 M P54.01
One thing we need to be careful is that generator is generating number on the fly and can only be used once; but it does the job.
Upvotes: 0
Reputation: 862511
I believe you can create range and add it to first values, for Sex
use ffill
:
r = np.arange(len(df))
df['Team'] = df.loc[0, 'Team'] + r
df['Code'] = df.loc[0, 'Code'] + r
df['Sex'] = df['Sex'].ffill()
print (df)
Word Count Team Sex Code
0 develop 9 1.0 M 45.01
1 Effective 7 2.0 M 46.01
2 professional 8 3.0 M 47.01
3 approach 5 4.0 M 48.01
4 raster 34 5.0 M 49.01
5 Sad 55 6.0 M 50.01
6 water 2 7.0 M 51.01
7 soil 7 8.0 M 52.01
8 farming 9 9.0 M 53.01
9 deep 12 10.0 M 54.01
More general with broadcasting:
cols = ['Team','Code']
r = np.arange(len(df))
df[cols] = r[:, None] + df.loc[0, cols].values
df['Sex'] = df['Sex'].ffill()
print (df)
Word Count Team Sex Code
0 develop 9 1.0 M 45.01
1 Effective 7 2.0 M 46.01
2 professional 8 3.0 M 47.01
3 approach 5 4.0 M 48.01
4 raster 34 5.0 M 49.01
5 Sad 55 6.0 M 50.01
6 water 2 7.0 M 51.01
7 soil 7 8.0 M 52.01
8 farming 9 9.0 M 53.01
9 deep 12 10.0 M 54.01
EDIT:
If there is string only before float you can extract it to df1
, add range and last add prefix:
r = np.arange(len(df))
df['Team'] = (df.loc[0, 'Team'] + r).astype(int)
df1 = df.loc[[0], 'Code'].str.extract('(\D+)(\d+\.\d+)', expand=False)
print (df1)
0 1
0 P 45.01
df['Code'] = float(df1.loc[0, 1]) + r
df['Code'] = df1.loc[0, 0] + df['Code'].astype(str)
df['Sex'] = df['Sex'].ffill()
print (df)
Word Count Team Sex Code
0 develop 9 1 M P45.01
1 Effective 7 2 M P46.01
2 professional 8 3 M P47.01
3 approach 5 4 M P48.01
4 raster 34 5 M P49.01
5 Sad 55 6 M P50.01
6 water 2 7 M P51.01
7 soil 7 8 M P52.01
8 farming 9 9 M P53.01
9 deep 12 10 M P54.01
EDIT:
Error seems there is no first index value 0
but something else. So is possible use iloc
which select by positions:
r = np.arange(len(df))
df['Team'] = (df.iloc[0, df.columns.get_loc('Team')] + r).astype(int)
df1 = df.iloc[[0], df.columns.get_loc('Code')].str.extract('(\D+)(\d+\.\d+)', expand=False)
df['Code'] = float(df1.loc[0, 1]) + r
df['Code'] = df1.loc[0, 0] + df['Code'].astype(str)
df['Sex'] = df['Sex'].ffill()
print (df)
Word Count Team Sex Code
0 develop 9 1 M P45.01
1 Effective 7 2 M P46.01
2 professional 8 3 M P47.01
3 approach 5 4 M P48.01
4 raster 34 5 M P49.01
5 Sad 55 6 M P50.01
6 water 2 7 M P51.01
7 soil 7 8 M P52.01
8 farming 9 9 M P53.01
9 deep 12 10 M P54.01
Upvotes: 4
Reputation: 69
df.insert(0,'Team',range(1,1+len(df)))
This will work with 1 and 2 existing
or
df=df.assign(Team=[i for i in xrange(len(df))])[['Team']+df.columns.tolist()]
Or ultimately
df=df.reset_index()
df['Team']=df.index+1
Havent tested it but it should work
Upvotes: 1