Reputation: 267
I have a dataframe that has domain names. But the problem is every character of the domain name is in single cell of a dataframe. Below is how it looks. the 'Column' is just column name for the first column.
testing = pd.DataFrame({'col':['h','h'],
'Unnamed :1':['t','t'],
'Unnamed :2':['t','t'],
'Unnamed :3':['p','p'],
'Unnamed :4':['s',':']})
print (testing)
col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0 h t t p s
1 h t t p :
I wish to concatenate every column and the resultant should look like
https
http:
My code : I read the excel sheet which has data, convert to dataframe and see if the first column of every row has one character or a string. If it is a character, I have to concatenate all the characters present in that entire row.
testing = pd.read_excel("path to .xlsx file")
for i in range(len(testing)):
if len(testing.iloc[i,0]) == 1:
testing.iloc[i,0] = testing.astype(str).values.sum(axis=1)
But this gives:
['https' 'http:' 'http:' 'http:' 'http:']
['https' 'http:' 'http:' 'http:' 'http:']
Upvotes: 1
Views: 479
Reputation: 863166
Here loops are not necessary, assign to first column with iloc
and :
for all rows:
testing = pd.read_excel("path to .xlsx file")
testing.iloc[:, 0] = testing.astype(str).values.sum(axis=1)
print (testing)
col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0 https t t p s
1 http: t t p :
EDIT: If need test first column for length first select by DataFrame.iloc
and then test by Series.str.len
, last set by values by DataFrame.where
to empty strings:
testing = pd.DataFrame({'col':['something','h'],
'Unnamed :1':['t','t'],
'Unnamed :2':['t','t'],
'Unnamed :3':['p','p'],
'Unnamed :4':['s',':']})
mask = testing.iloc[:, 0].str.len() == 1
testing.iloc[:, 0] = testing.astype(str).where(mask, '').values.sum(axis=1)
print (testing)
col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4
0 t t p s
1 http: t t p :
Upvotes: 1
Reputation: 796
Here is a solution that checks if the first column for every row has one character:
testing = pd.DataFrame({'col':['h','h', 'erqi'],
'Unnamed :1':['t','t', 'tsdf'],
'Unnamed :2':['t','t', 'rs'],
'Unnamed :3':['p','p', 'sf'],
'Unnamed :4':['s',':', 'sdf']})
mask = testing.col.str.len().eq(1)
testing["address"] = np.where(mask, testing.astype(str).values.sum(axis=1), "")
Output:
col Unnamed :1 Unnamed :2 Unnamed :3 Unnamed :4 address
0 h t t p s https
1 h t t p : http:
2 erqi tsdf rs sf sdf
Upvotes: 1