Daniel_Fortesque
Daniel_Fortesque

Reputation: 13

Manipulate Dataframe Series

I have a dataframe and I want to change some element of a column based on a condition. In particular given this column:

...    VALUE    ....
       0
       "1076A"
       12
       9
       "KKK0139"
       5

I want to obtain this:

...    VALUE    ....
       0
       "1076A"
       12
       9
       "0139"
       5

In the 'VALUE' column there are both strings and numbers, when I found a particular substring in a string value, I want to obtain the same value without that substring. I have tried:

1) df['VALUE'] = np.where(df['VALUE'].str.contains('KKK', na=False), df['VALUE'].str[3:], df['VALUE'])

2) df.loc[df['VALUE'].str.contains('KKK', na=False), 'VALUE'] = df['VALUE'].str[3:]

But these two attempts returns a IndexError: invalid index to scalar variable

Some advice ?

Upvotes: 1

Views: 272

Answers (4)

SeaBean
SeaBean

Reputation: 23237

As the column contains both numeric value (non-string) and string values, you cannot use .str.replace() since it handles strings only. You have to use .replace() instead. Otherwise, non-string elements will be converted to NaN by str.replace().

Here, you can use:

df['VALUE'] = df['VALUE'].replace(r'KKK', '', regex=True)

Input:

data = {'VALUE': [0, "1076A", 12, 9, "KKK0139", 5]}
df = pd.DataFrame(data)

Result:

0        0
1    1076A
2       12
3        9
4     0139
5        5
Name: VALUE, dtype: object

If you use .str.replace(), you will get:

Note the NaN values result for numeric values (not of string type)

0      NaN
1    1076A
2      NaN
3      NaN
4     0139
5      NaN
Name: VALUE, dtype: object

In general, if you want to remove leading alphabet substring, you can use:

df['VALUE'] = df['VALUE'].replace(r'^[A-Za-z]+', '', regex=True)

Upvotes: 1

Corralien
Corralien

Reputation: 120559

>>> df['VALUE'].str.replace(r'KKK', '')
0        0
1    1076A
2       12
3        9
4     0139
5        5
Name: VALUE, dtype: object

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18476

Looking at your sample data, if k is the only problem, just replace it with empty string

df['VALUE'].str.replace('K', '')
0          0
1    "1076A"
2         12
3          9
4     "0139"
5          5
Name: text, dtype: object

If you want to do it for specific occurrences or positions of k, you can do that as well.

Upvotes: 0

Erlinska
Erlinska

Reputation: 433

Your second solution fails because you also need to apply the row selector to the right side of your assignment.

df.loc[df['VALUE'].str.contains('KKK', na=False), 'VALUE'] = df.loc[df['VALUE'].str.contains('KKK', na=False), 'VALUE'].str[3:]

Upvotes: 0

Related Questions