How to replace the periods values in a dataframe to null or other values?

Question

The following code:

print(PB_PID_group)
print(type(PB_PID_group))

gives me:

PI
.             [., 5398, 5482, 5467]
1311    [5185, ., 5398, 5467, 5576]
1667                      [., 6446]
3352                            [.]
935                             [.]
Name: PID, dtype: object

I then changed this to dataframe (pandas)

PB_PID_df = pd.DataFrame(PB_PID_group)

print(type(PB_PID_df))

which gives me:

Then I write the dataframe to a file:

pd.DataFrame.to_csv(PB_PID_df,'updated_df_table.txt', sep='	', index=True, na_rep='none')

writes:

PI      PID
.       ['.' '5398' '5482' '5467']
1311    ['5185' '.' '5398' '5467' '5576']
1667    ['.' '6446']
3352    ['.']
935     ['.']

I want to remove the lines that has PI values as period (.) and only remove period from PID column.

I tried.

PB_PID_df['PID'] = PB_PID_df['PID'].replace(to_replace='.', value='na', regex=True)

I also tried without regex and other method options, but its not working.

Any suggestions.

Thanks,

EdChum · Accepted Answer

When you made a DataFrame from the existing Series, the index was resused, so to drop the initial row you needed to call drop and pass the label for that row '.'.

As you now have lists as the dtype which is weird, you can't use replace anymore as this looks for exact values to find and won't understand list types so you can use apply to iteratively test each value and replace with string 'na':

In [12]:
# setup some data
df = pd.DataFrame({'PID':[['.',5398, 5482, 5467], [5185, '.', 5398, 5467, 5576]]}, index=['.',1311])
df

Out[12]:
                              PID
.           [., 5398, 5482, 5467]
1311  [5185, ., 5398, 5467, 5576]

Now drop and replace using apply with lambda and list comprehension:

In [13]:
df.drop('.',inplace=True)
df['PID'] = df['PID'].apply(lambda x: [x if x != '.' else 'na' for x in x])
df

Out[13]:
                               PID
1311  [5185, na, 5398, 5467, 5576]

EDIT

To answer the additional query in comments, to remove a value modify the list comprehension so that the if condition is at the end:

In [19]:
df['PID'] = df['PID'].apply(lambda x: [x for x in x if x != '.'])
df

Out[19]:
                           PID
1311  [5185, 5398, 5467, 5576]

How to replace the periods values in a dataframe to null or other values?

Answers (1)

Related Questions