Newbie14
Newbie14

Reputation: 33

Combine two columns of numbers in dataframe into single column using pandas/python

I'm very new to Pandas and Python.

I have a 3226 x 61 dataframe and I would like to combine two columns into a single one.

The two columns I would like to combine are both integers - one has either one or two digits (1 through 52) while the other has three digits (e.g., 1 or 001, 23 or 023). I need the output to be a five digit integer (e.g., 01001 or 52023). There will be no mathematical operations with the resulting integers - I will need them only for look-up purposes.

Based on some of the other posts on this fantastic site, I tried the following:

df['YZ'] = df['Y'].map(str) + df['Z'].map(str)

But that returns "1.00001 for a first column of "1" and second column of "001", I believe because making "1" a str turns it into "1.0", which "001" is added to the end.

I've also tried:

df['YZ'] = df['Y'].join(df['Z'])

Getting the following error:

AttributeError: 'Series' object has no attribute 'join'

I've also tried:

df['Y'] = df['Y'].astype(int)
df['Z'] = df['Z'].astype(int)
df['YZ'] = df[['Y','Z']].apply(lambda x: ''.join(x), axis=1)

Getting the following error:

TypeError: ('sequence item 0: expected str instance, numpy.int32 

found', 'occurred at index 0')

A copy of the columns is below:

1   1
1   3
1   5
1   7
1   9
1   11
1   13

I understand there are two issues here:

Frankly, I need help with both but would be most appreciative of the column combining problem.

Upvotes: 3

Views: 5368

Answers (1)

jezrael
jezrael

Reputation: 863146

I think you need convert columns to string, add 0 by zfill and simply sum by +:

df['YZ'] = df['Y'].astype(str).str.zfill(2) + df['Z'].astype(str).str.zfill(3)

Sample:

df=pd.DataFrame({'Y':[1,3,5,7], 'Z':[10,30,51,74]})
print (df)
   Y   Z
0  1  10
1  3  30
2  5  51
3  7  74

df['YZ'] = df['Y'].astype(str).str.zfill(2) + df['Z'].astype(str).str.zfill(3)
print (df)
   Y   Z     YZ
0  1  10  01010
1  3  30  03030
2  5  51  05051
3  7  74  07074

If need also change original columns:

df['Y'] = df['Y'].astype(str).str.zfill(2)
df['Z'] = df['Z'].astype(str).str.zfill(3)
df['YZ'] = df['Y'] + df['Z']
print (df)
    Y    Z     YZ
0  01  010  01010
1  03  030  03030
2  05  051  05051
3  07  074  07074

Solution with join:

df['Y'] = df['Y'].astype(str).str.zfill(2)
df['Z'] = df['Z'].astype(str).str.zfill(3)
df['YZ'] = df[['Y','Z']].apply('-'.join, axis=1)
print (df)
    Y    Z      YZ
0  01  010  01-010
1  03  030  03-030
2  05  051  05-051
3  07  074  07-074

And without change original columns:

df['YZ'] = df['Y'].astype(str).str.zfill(2) + '-' + df['Z'].astype(str).str.zfill(3)
print (df)
   Y   Z      YZ
0  1  10  01-010
1  3  30  03-030
2  5  51  05-051
3  7  74  07-074

Upvotes: 3

Related Questions