Reputation: 735
I want to fill empty cells with with previous row value if they start with number. For example, I have
Text Text
30 Text Text
Text Text
Text Text
31 Text Text
Text Text
31 Text Text
Text Text
Text Text
32 Text Text
Text Text
Text Text
Text Text
Text Text
Text Text
I however, want to have
Text Text
30 Text Text
30 Text Text
30 Text Text
31 Text Text
Text Text
31 Text Text
31 Text Text
31 Text Text
32 Text Text
Text Text
Text Text
Text Text
Text Text
Text Text
I tried to reach this by using this code:
data = pd.read_csv('DATA.csv',sep='\t', dtype=object, error_bad_lines=False)
data = data.fillna(method='ffill', inplace=True)
print(data)
but it did not work.
Is there anyway to do this?
Upvotes: 38
Views: 75974
Reputation: 1448
In general, if you want to fill empty cells with the previous row value, you can just use a recursive function like:
def same_as_upper(col:pd.Series)-> pd.Series:
'''
Recursively fill NaN rows with the previous value
'''
if any(pd.Series(col).isna()):
col=pd.Series(np.where(col.isna(), col.shift(1), col))
return same_as_upper(col)
else:
return col
Applying the function using df['A']=same_as_upper(df['A'])
, this simple case with your data would render:
A | B | C |
---|---|---|
Text | Text | nan |
30 | Text | Text |
30 | Text | Text |
30 | Text | Text |
31 | Text | Text |
Text | Text | nan |
31 | Text | Text |
31 | Text | Text |
31 | Text | Text |
32 | Text | Text |
Text | Text | nan |
Text | Text | Text |
Text | Text | Text |
Text | Text | Text |
Text | Text | Text |
However, you have the additional constraint of not replicating the previous value if this value is not an integer. In that case, one solution is add a sample value in the appropriate location, use the same function and then replace the sample value for nan
:
df.loc[11,'A']=999
df['A']=same_as_upper(df['A'])
df['A']=df['A'].replace(999,np.nan)
Result:
A | B | C |
---|---|---|
Text | Text | nan |
30 | Text | Text |
30 | Text | Text |
30 | Text | Text |
31 | Text | Text |
Text | Text | nan |
31 | Text | Text |
31 | Text | Text |
31 | Text | Text |
32 | Text | Text |
Text | Text | nan |
nan | Text | Text |
nan | Text | Text |
nan | Text | Text |
nan | Text | Text |
Upvotes: 3
Reputation: 57033
First, replace your empty cells with NaNs:
df[df[0]==""] = np.NaN
Now, Use ffill()
:
df.fillna(method='ffill')
# 0
#0 Text
#1 30
#2 30
#3 30
#4 31
#5 Text
#6 31
#7 31
#8 31
#9 32
Upvotes: 73
Reputation: 862551
I think you can first get NaN
instead whitespaces
:
df.Text = df.Text[df.Text.str.strip() != '']
print (df)
Text Text.1
0 30 Text Text
1 NaN Text Text
2 NaN Text Text
3 31 Text Text
4 Text Text NaN
5 31 Text Text
6 NaN Text Text
7 NaN Text Text
8 32 Text Text
9 Text Text NaN
10 NaN Text Text
11 NaN Text Text
12 NaN Text Text
13 NaN Text Text
Then use ffill
(same as fillna
with parameter ffill
), get to_numeric
for where
for replace NaN
if not numeric forward filling NaN
, last replace NaN
by empty string by fillna
:
orig = df.Text.copy()
df.Text = df.Text.ffill()
mask1 = pd.to_numeric(df.Text, errors='coerce')
df.Text = df.Text.where(mask1, orig).fillna('')
print (df)
Text Text.1
0 30 Text Text
1 30 Text Text
2 30 Text Text
3 31 Text Text
4 Text Text NaN
5 31 Text Text
6 31 Text Text
7 31 Text Text
8 32 Text Text
9 Text Text NaN
10 Text Text
11 Text Text
12 Text Text
13 Text Text
Upvotes: 15