user3159004
user3159004

Reputation:

Remove white space from entire DataFrame

i have a dataframe, 22 columns and 65 rows. The data comes in from csv file. Each of the values with dataframe has an extra unwanted whitespace. So if i do a loop on 'Year' column with a Len() i get

2019  5
2019  5
2018  5
...

this 1 extra whitespace appears throughout DF in every value. I tried running a .strip() on DF but no attribute exists

i tried a 'for each df[column].str.strip() but there are various data types in each column... dtypes: float64(6), int64(4), object(14) , so this errors.

any ideas on how to apply a function for entire dataframe, and if so, what function/method? if not what is best way to handle?

Upvotes: 0

Views: 3423

Answers (4)

Ming Jun Lim
Ming Jun Lim

Reputation: 142

Why not try this?

for column in df.columns:
    df[column] = df[column].apply(lambda x: str(x).strip())

Upvotes: 0

Emad
Emad

Reputation: 71

Try this:

for column in df.columns:
    df[column] = df[column].apply(lambda x: str(x).replace('  ', ' '))

Upvotes: 0

ALollz
ALollz

Reputation: 59549

Handle the error:

for col in df.columns:
    try:
        df[col] = df[col].str.strip()
    except AttributeError:
        pass

Normally, I'd say select the object dtypes, but that can still be problematic if the data are messy enough to store numeric data in an object container.

import pandas as pd

df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['seven ']*3})
df['foo2'] = df.foo.astype(object)

for col in df.select_dtypes('object'):
    df[col] = df[col].str.strip()
#AttributeError: Can only use .str accessor with string values!

Upvotes: 2

Mohsen_Fatemi
Mohsen_Fatemi

Reputation: 3391

you should use apply() function in order to do this :

df['Year'] = df['Year'].apply(lambda x:x.strip() )

you can apply this function on each column separately :

for column in df.columns:
    df[column] = df[column].apply(lambda x:x.strip() )

Upvotes: 0

Related Questions