Reputation: 7448
I tried to retrieve strings from a subset of columns from a DataFrame
, concatenate the strings into one string, and then put these into a list,
# row_subset is a sub-DataFrame of some DataFrame
sub_columns = ['A', 'B', 'C']
string_list = [""] * row_subset.shape[0]
for x in range(0, row_subset.shape[0]):
for y in range(0, len(sub_columns)):
string_list[x] += str(row_subset[sub_columns[y]].iloc[x])
so the result is like,
['row 0 string concatenation','row 1 concatenation','row 2 concatenation','row3 concatenation']
I am wondering what is the best way to do this, more efficiently?
Upvotes: 0
Views: 865
Reputation: 862781
I think you need select columns by subset by []
first and then sum
or if need separator use join
:
df = pd.DataFrame({'A':list('abcdef'),
'B':list('qwerty'),
'C':list('fertuj'),
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a q f 1 5 a
1 b w e 3 3 a
2 c e r 5 6 a
3 d r t 7 9 b
4 e t u 1 2 b
5 f y j 0 4 b
sub_columns = ['A', 'B', 'C']
print (df[sub_columns].sum(axis=1).tolist())
['aqf', 'bwe', 'cer', 'drt', 'etu', 'fyj']
print (df[sub_columns].apply(' '.join, axis=1).tolist())
['a q f', 'b w e', 'c e r', 'd r t', 'e t u', 'f y j']
Very similar numpy solution:
print (df[sub_columns].values.sum(axis=1).tolist())
['aqf', 'bwe', 'cer', 'drt', 'etu', 'fyj']
Upvotes: 4