developer
developer

Reputation: 267

How to concatenate characters of certain columns of a dataframe?

I have a dataframe that has domain names. But the problem is every character of the domain name is in single cell of a dataframe. Below is how it looks. the 'Column' is just column name for the first column.

testing = pd.DataFrame({'col':['h','h'],
                        'Unnamed :1':['t','t'],
                        'Unnamed :2':['t','t'],
                        'Unnamed :3':['p','p'],
                        'Unnamed :4':['s',':']})


print (testing)
  col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0   h          t          t          p          s
1   h          t          t          p          :

I wish to concatenate every column and the resultant should look like

https
http:

My code : I read the excel sheet which has data, convert to dataframe and see if the first column of every row has one character or a string. If it is a character, I have to concatenate all the characters present in that entire row.

testing = pd.read_excel("path to .xlsx file")  
for i in range(len(testing)):      
    if len(testing.iloc[i,0]) == 1:
        testing.iloc[i,0] = testing.astype(str).values.sum(axis=1)

But this gives:

['https' 'http:' 'http:' 'http:' 'http:']

['https' 'http:' 'http:' 'http:' 'http:']

Upvotes: 1

Views: 479

Answers (2)

jezrael
jezrael

Reputation: 863166

Here loops are not necessary, assign to first column with iloc and : for all rows:

testing = pd.read_excel("path to .xlsx file")  
testing.iloc[:, 0] = testing.astype(str).values.sum(axis=1)
print (testing)
     col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0  https          t          t          p          s
1  http:          t          t          p          :

EDIT: If need test first column for length first select by DataFrame.iloc and then test by Series.str.len, last set by values by DataFrame.where to empty strings:

testing = pd.DataFrame({'col':['something','h'],
                        'Unnamed :1':['t','t'],
                        'Unnamed :2':['t','t'],
                        'Unnamed :3':['p','p'],
                        'Unnamed :4':['s',':']})

mask = testing.iloc[:, 0].str.len() == 1
testing.iloc[:, 0] = testing.astype(str).where(mask, '').values.sum(axis=1)
print (testing)
     col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0                 t          t          p          s
1  http:          t          t          p          :

Upvotes: 1

SmileyProd
SmileyProd

Reputation: 796

Here is a solution that checks if the first column for every row has one character:

testing = pd.DataFrame({'col':['h','h', 'erqi'],
                   'Unnamed :1':['t','t', 'tsdf'],
                   'Unnamed :2':['t','t', 'rs'],
                   'Unnamed :3':['p','p', 'sf'],
                   'Unnamed :4':['s',':', 'sdf']})

mask = testing.col.str.len().eq(1)
testing["address"] = np.where(mask, testing.astype(str).values.sum(axis=1), "")

Output:

    col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4 address
0     h          t          t          p          s   https
1     h          t          t          p          :   http:
2  erqi       tsdf         rs         sf        sdf        

Upvotes: 1

Related Questions