excelguy
excelguy

Reputation: 1624

Python, converting int to str, trailing/leading decimal/zeros

I convert my dataframe values to str, but when I concatenate them together the previous ints are including trailing decimals.

df["newcol"] = df['columna'].map(str) + '_' + df['columnb'].map(str) + '_' + df['columnc'].map(str)

This is giving me output like 500.0 how can I get rid of this leading/trailing decimal? sometimes my data in column a will have non alpha numeric characters.

+---------+---------+---------+------------------+----------------------+
| columna | columnb | columnc |     expected     |  currently getting   |
+---------+---------+---------+------------------+----------------------+
|         |      -1 |      27 | _-1_27           | _-1.0_27.0           |
|         |      -1 |      42 | _-1_42           | _-1.0_42.0           |
|         |      -1 |      67 | _-1_67           | _-1.0_67.0           |
|         |      -1 |      95 | _-1_95           | _-1.0_95.0           |
| 91_CCMS |   14638 |      91 | 91_CCMS_14638_91 | 91_CCMS_14638.0_91.0 |
| DIP96   |    1502 |      96 | DIP96_1502_96    | DIP96_1502.0_96.0    |
| 106     |   11694 |     106 | 106_11694_106    | 00106_11694.0_106.0  |
+---------+---------+---------+------------------+----------------------+

Error:

invalid literal for int() with base 10: ''

Upvotes: 2

Views: 186

Answers (2)

I couldn't reproduce this error but maybe you could try something like:

df["newcol"] = df['columna'].map(lambda x: str(int(x)) if isinstance(x, int) else str(x)) + '_' + df['columnb'].map(lambda x: str(int(x))) + '_' + df['columnc'].map(lambda x: str(int(x)))

Upvotes: 1

Andy L.
Andy L.

Reputation: 25269

Edit:
If your df has more than 3 columns, and you want to join only 3 columns, you may specify those columns in the command using columns slicing. Assume your df has 5 columns named as : AA, BB, CC, DD, EE. You want only joining columns CC, DD, EE. You just need to specify those 3 columns before the fillna, and assign the result to newcol as you want:

df["newcol"] = df[['CC', 'DD', 'EE']].fillna('') \
                   .applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)

Note: I just break command into 2 lines using '\' for easy reading.


Original:
I guess your real data of columna columnb columnc contain str, float, int, empty space, blank space, and maybe even NaN.

Float with decimal values = .00 in a column dtype object will show without decimal.

Assume your df has only 3 columns: colmna, columnb, columnc as you said. Using command below will handle: str, float, int, NaN and joining 3 columns into one as you want:

df.fillna('').applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)

I created a sample similar as yours

   columna columnb columnc
0               -1      27
1      NaN      -1      42
2               -1      67
3               -1      95
4  91_CCMS   14638      91
5    DIP96              96
6      106   11694     106

Using your command returns the concatenated string having '.0' as you described

df['columna'].map(str) + '_' + df['columnb'].map(str) + '_' + df['columnc'].map(str)

Out[1926]:
0          _-1.0_27.0
1       nan_-1.0_42.0
2          _-1.0_67.0
3          _-1.0_95.0
4    91_CCMS_14638_91
5           DIP96__96
6       106_11694_106
dtype: object

Using my command:

df.fillna('').applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)

Out[1927]:
0              _-1_27
1              _-1_42
2              _-1_67
3              _-1_95
4    91_CCMS_14638_91
5           DIP96__96
6       106_11694_106
dtype: object

Upvotes: 1

Related Questions