Reputation: 145
I have a data frame that contains a column with comma separated values. I would like to convert the string values in that column to integers.
I am newish to coding in general so a brief explanation of what is happening would be massively appreciated. If you have time.
I have tried the following code.
df['col3'].str.strip(',').astype(int)
df
col1 col2 col3
1 x 12,123
2 x 1,123
3 y 45,998
df
col1 col2 col3
1 x 12123
2 x 1123
3 y 45998
Upvotes: 8
Views: 6648
Reputation: 1888
All the answers talk about solving it after the data is read from the source like csv or excel. Another way to look at the problem is to normalize the data during reading from the source. Here is how you do when using read_csv or read_excel
pd.read_csv('your_file_name', thousands=',')
pd.read_excel('your/file/name', thousands=',')
See panda documentation read_excel and read_csv
Upvotes: 0
Reputation: 8816
There are already answers to this question but , i would like to add a another solution:
DataFrame:
>>> df
col1 col2 col3
0 1 x 12,123
1 2 x 1,123
2 3 y 45,998
Try simplest by using str.replace
method and you are all done:
>>> df['col3'] = df['col3'].str.replace(",", "")
# df['col3'] = df['col3'].str.replace(",", "").astype(int) <- cast to int
>>> df
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
OR
another using df.replace
along with regex method as Regex substitution is performed under the hood with re.sub
. The rules for substitution for re.sub
are the same.
>>> df['col3'] = df['col3'].replace(',', '', regex=True)
>>> df
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
Upvotes: 8
Reputation: 6091
Brief explanation:
df['col3'].str.strip(',').str.join('').astype(int)
df['col3']
generates a pandas.Series
from the values of col3
_______.str
can be understood as a cast-to-string, usually means you would like to use a string method to the contents of your series_____.str.strip(',')
uses the strip
method: break a string into substrings, using the separator provided as the parameter used to distinguish when one substring ends and when the next one begins_____.str.strip(',').str.join('')
takes the substrings generated by the split and concatenates them together (effectively you're just removing the separator)____.astype(int)
casts your result to an intCredit to nixon on including the join
to generate the actual desired output. Hope this helps, happy coding!
Upvotes: 2
Reputation: 88236
I think your solution should actually be:
df['col3'] = df.col3.str.split(',').str.join('').astype(int)
col1 col2 col3
0 1 x 12123
1 2 x 1123
2 3 y 45998
As str.strip
only strips from the left and right sides.
Explanation
str
: Allows for vectorized string functions for Seriessplit
: Will split each element in the list according to some pattern, ,
in this casejoin
: will join elements in the now Series of lists with a passed delimeter, ''
here as you want to create ints
.And finally .astype(int)
to turn each string into an integer
Upvotes: 11