user14383280
user14383280

Reputation:

Can't remove spaces from pandas dataframe

Have this data:

                              region gdp_per_capita
0                      Coasts of USA         71 546
1          USA: New York, New Jersey         81 615
2                    USA: California         74 205
3                   USA: New England         74 000

Wanna get this:

                              region gdp_per_capita
0                      Coasts of USA          71546
1          USA: New York, New Jersey          81615
2                    USA: California          74205
3                   USA: New England          74000

Tried to use df.columns = df.columns.str.replace(' ', ''), but it did not work

Upvotes: 2

Views: 3107

Answers (4)

Pavel Fedotov
Pavel Fedotov

Reputation: 885

import re
df['gdp_per_capita'] = df['gdp_per_capita'].apply(lambda x: re.sub("[^0-9]", "", str(x))).astype(int)

Upvotes: 1

NYC Coder
NYC Coder

Reputation: 7604

Just this should do:

df['gdp_per_capita'] = df['gdp_per_capita'].astype(str).str.replace('\s+', '').replace('nan', np.nan)
df['gdp_per_capita'] = pd.to_numeric(df['gdp_per_capita'])
print(df)

                      region gdp_per_capita
0              Coasts of USA          71546
1  USA: New York, New Jersey          81615
2            USA: California          74205
3           USA: New England          74000

Upvotes: 5

rafaelc
rafaelc

Reputation: 59294

Looks like you want to work with numbers rather than strings.

Hence, replacing ' ' with '' and using pd.to_numeric seems like an easy and solid approach.

Let me suggest another one which might or might not be good (it depends on your dataset).

If the thousands in your dataset are separated by a whitespace (' '), you can just read your df as

df = pd.read_csv(file, thousands = ' ')

and all your columns with 74 109 would be read as 74109 and dtype integer or float.

Upvotes: 2

user13036845
user13036845

Reputation:

I am not quite sure it will work or not but try the following:

  1. Trim leading space of column in pandas – lstrip()
  2. Trim trailing space of column in pandas – rstrip()
  3. Trim Both leading and trailing space of column in pandas – strip()
  4. Strip all the white space of column in pandas.

Let me know if it works :)

Upvotes: 0

Related Questions