Reputation: 1624
I convert my dataframe values to str, but when I concatenate them together the previous ints are including trailing decimals.
df["newcol"] = df['columna'].map(str) + '_' + df['columnb'].map(str) + '_' + df['columnc'].map(str)
This is giving me output like
500.0
how can I get rid of this leading/trailing decimal? sometimes my data in column a will have non alpha numeric characters.
+---------+---------+---------+------------------+----------------------+
| columna | columnb | columnc | expected | currently getting |
+---------+---------+---------+------------------+----------------------+
| | -1 | 27 | _-1_27 | _-1.0_27.0 |
| | -1 | 42 | _-1_42 | _-1.0_42.0 |
| | -1 | 67 | _-1_67 | _-1.0_67.0 |
| | -1 | 95 | _-1_95 | _-1.0_95.0 |
| 91_CCMS | 14638 | 91 | 91_CCMS_14638_91 | 91_CCMS_14638.0_91.0 |
| DIP96 | 1502 | 96 | DIP96_1502_96 | DIP96_1502.0_96.0 |
| 106 | 11694 | 106 | 106_11694_106 | 00106_11694.0_106.0 |
+---------+---------+---------+------------------+----------------------+
Error:
invalid literal for int() with base 10: ''
Upvotes: 2
Views: 186
Reputation: 684
I couldn't reproduce this error but maybe you could try something like:
df["newcol"] = df['columna'].map(lambda x: str(int(x)) if isinstance(x, int) else str(x)) + '_' + df['columnb'].map(lambda x: str(int(x))) + '_' + df['columnc'].map(lambda x: str(int(x)))
Upvotes: 1
Reputation: 25269
Edit:
If your df
has more than 3 columns, and you want to join only 3 columns, you may specify those columns in the command using columns slicing. Assume your df
has 5 columns named as : AA
, BB
, CC
, DD
, EE
. You want only joining columns CC
, DD
, EE
. You just need to specify those 3 columns before the fillna
, and assign the result to newcol
as you want:
df["newcol"] = df[['CC', 'DD', 'EE']].fillna('') \
.applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)
Note: I just break command into 2 lines using '\'
for easy reading.
Original:
I guess your real data of columna
columnb
columnc
contain str
, float
, int
, empty space, blank space, and maybe even NaN
.
Float
with decimal values = .00 in a column dtype object
will show without decimal.
Assume your df
has only 3 columns: colmna
, columnb
, columnc
as you said. Using command below will handle: str
, float
, int
, NaN
and joining 3 columns into one as you want:
df.fillna('').applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)
I created a sample similar as yours
columna columnb columnc
0 -1 27
1 NaN -1 42
2 -1 67
3 -1 95
4 91_CCMS 14638 91
5 DIP96 96
6 106 11694 106
Using your command returns the concatenated string having '.0' as you described
df['columna'].map(str) + '_' + df['columnb'].map(str) + '_' + df['columnc'].map(str)
Out[1926]:
0 _-1.0_27.0
1 nan_-1.0_42.0
2 _-1.0_67.0
3 _-1.0_95.0
4 91_CCMS_14638_91
5 DIP96__96
6 106_11694_106
dtype: object
Using my command:
df.fillna('').applymap(lambda x: x if isinstance(x, str) else str(int(x))).agg('_'.join, axis=1)
Out[1927]:
0 _-1_27
1 _-1_42
2 _-1_67
3 _-1_95
4 91_CCMS_14638_91
5 DIP96__96
6 106_11694_106
dtype: object
Upvotes: 1