everestial
everestial

Reputation: 7255

How to replace the periods values in a dataframe to null or other values?

The following code:

print(PB_PID_group)
print(type(PB_PID_group))

gives me:

PI
.             [., 5398, 5482, 5467]
1311    [5185, ., 5398, 5467, 5576]
1667                      [., 6446]
3352                            [.]
935                             [.]
Name: PID, dtype: object
<class 'pandas.core.series.Series'>

I then changed this to dataframe (pandas)

PB_PID_df = pd.DataFrame(PB_PID_group)

print(type(PB_PID_df))

which gives me:

<class 'pandas.core.frame.DataFrame'>

Then I write the dataframe to a file:

pd.DataFrame.to_csv(PB_PID_df,'updated_df_table.txt', sep='\t', index=True, na_rep='none')

writes:

PI      PID
.       ['.' '5398' '5482' '5467']
1311    ['5185' '.' '5398' '5467' '5576']
1667    ['.' '6446']
3352    ['.']
935     ['.']

I want to remove the lines that has PI values as period (.) and only remove period from PID column.

I tried.

PB_PID_df['PID'] = PB_PID_df['PID'].replace(to_replace='.', value='na', regex=True)

I also tried without regex and other method options, but its not working.

Any suggestions.

Thanks,

Upvotes: 1

Views: 779

Answers (1)

EdChum
EdChum

Reputation: 394051

When you made a DataFrame from the existing Series, the index was resused, so to drop the initial row you needed to call drop and pass the label for that row '.'.

As you now have lists as the dtype which is weird, you can't use replace anymore as this looks for exact values to find and won't understand list types so you can use apply to iteratively test each value and replace with string 'na':

In [12]:
# setup some data
df = pd.DataFrame({'PID':[['.',5398, 5482, 5467], [5185, '.', 5398, 5467, 5576]]}, index=['.',1311])
df

Out[12]:
                              PID
.           [., 5398, 5482, 5467]
1311  [5185, ., 5398, 5467, 5576]

Now drop and replace using apply with lambda and list comprehension:

In [13]:
df.drop('.',inplace=True)
df['PID'] = df['PID'].apply(lambda x: [x if x != '.' else 'na' for x in x])
df

Out[13]:
                               PID
1311  [5185, na, 5398, 5467, 5576]

EDIT

To answer the additional query in comments, to remove a value modify the list comprehension so that the if condition is at the end:

In [19]:
df['PID'] = df['PID'].apply(lambda x: [x for x in x if x != '.'])
df

Out[19]:
                           PID
1311  [5185, 5398, 5467, 5576]

Upvotes: 1

Related Questions