Reputation:
i have a dataframe, 22 columns and 65 rows. The data comes in from csv file. Each of the values with dataframe has an extra unwanted whitespace. So if i do a loop on 'Year' column with a Len() i get
2019 5
2019 5
2018 5
...
this 1 extra whitespace appears throughout DF in every value. I tried running a .strip() on DF but no attribute exists
i tried a 'for each df[column].str.strip() but there are various data types in each column... dtypes: float64(6), int64(4), object(14) , so this errors.
any ideas on how to apply a function for entire dataframe, and if so, what function/method? if not what is best way to handle?
Upvotes: 0
Views: 3423
Reputation: 142
Why not try this?
for column in df.columns:
df[column] = df[column].apply(lambda x: str(x).strip())
Upvotes: 0
Reputation: 71
Try this:
for column in df.columns:
df[column] = df[column].apply(lambda x: str(x).replace(' ', ' '))
Upvotes: 0
Reputation: 59549
Handle the error:
for col in df.columns:
try:
df[col] = df[col].str.strip()
except AttributeError:
pass
Normally, I'd say select the object
dtypes, but that can still be problematic if the data are messy enough to store numeric data in an object container.
import pandas as pd
df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['seven ']*3})
df['foo2'] = df.foo.astype(object)
for col in df.select_dtypes('object'):
df[col] = df[col].str.strip()
#AttributeError: Can only use .str accessor with string values!
Upvotes: 2
Reputation: 3391
you should use apply()
function in order to do this :
df['Year'] = df['Year'].apply(lambda x:x.strip() )
you can apply this function on each column separately :
for column in df.columns:
df[column] = df[column].apply(lambda x:x.strip() )
Upvotes: 0