Reputation: 4661
I need to iterate over each row of a pandas df and turn this into a comma separated string.
example:
df3 = DataFrame(np.random.randn(10, 5),
columns=['a', 'b', 'c', 'd', 'e'])
a b c d e
0 -0.158897 -0.749799 0.268921 0.070035 0.099600
1 -0.863654 -0.086814 -0.614562 -1.678850 0.980292
2 -0.098168 0.710652 -0.456274 -0.373153 -0.533463
3 1.001634 -0.736187 -0.812034 0.223062 -1.337972
4 0.173549 -0.576412 -1.016063 -0.217242 0.443794
5 0.273695 0.335562 0.778393 -0.668368 0.438880
6 -0.783824 1.439888 1.057639 -1.825481 -0.770953
7 -1.025004 0.155974 0.645023 0.993379 -0.812133
8 0.953448 -1.355628 -1.918317 -0.966472 -0.618744
9 -0.479297 0.295150 -0.294449 0.679416 -1.813078
I'd like to get for each row:
'-0.158897,-0.749799,0.268921,0.070035,0.099600'
'0.863654,-0.086814,-0.614562,-1.678850,0.980292'
... and so on
Upvotes: 45
Views: 108875
Reputation: 78
The best way is to use to_csv
. All other answers use hacks to get the data in comma separated format, but this a method is designed to do precisely that.
(Dennis Golomazov's answer uses this method, but as he points out his way will fail if cells contain \n
s).
One-liner:
df = pd.DataFrame(np.random.randn(10, 5),
columns=['a', 'b', 'c', 'd', 'e'])
print([df.iloc[i:i+1].to_csv(header=None, index=False)[:-1] for i in range(len(df))])
Note: as per to_csv docs, if no value is provided for a filename, this method returns a comma-separated string instead, which can then be printed.
path_or_buf: default None. If None, the result is returned as a string
You may wish to make the lack of filename explicit in your code:
print([df.iloc[i:i+1].to_csv(path_or_buf=None, header=None, index=False)[:-1] for i in range(len(df))])
(specifying this is strictly unnecessary since the default value for path_or_buf
is anyway None
, but helps clarify that you do not intend to save to csv).
Upvotes: 0
Reputation: 188
A simple (but probably "slow") approach that works if you want to merely display your output:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3),
columns=['a', 'b', 'c'])
for row in df.values:
print(str(tuple(row))[1:-1]) # the str slice removes parentheses from each line
output:
-0.3998584095092047, -1.3422828824038897, 1.4978979391522116
-0.42052539593647986, 0.14744089429376178, -0.19562046086556256
-0.8869262212387541, 1.1555551779212356, 0.49792754484559537
0.8856088499836354, -0.12525995120110522, -0.6203181437240001
0.13226145977531473, -0.1556963039277225, -1.124345242588383
If you need to export your output you can take this same approach but nest it in another data structure. This example outputs a new dataframe with each row as the new comma separated value (using list comprehension instead of for-loop).
pd.DataFrame([str(tuple(row))[1:-1] for row in df.values])
output:
Upvotes: 0
Reputation: 5055
Here is a one liner:
df3['Combo'] = df3[df3.columns].astype(str).apply(lambda x: ', '.join(x), axis = 1)
This creates a new dataframe column whose rows are csv strings (containing the contents of all the other columns)
before:
import numpy as np
import pandas as pd
df3 = pd.DataFrame(np.random.randn(10, 5),
columns=['a', 'b', 'c', 'd', 'e'])
a b c d e
0 0.870579 -1.356070 -0.169689 -0.148766 -1.520965
1 0.292316 -1.703772 -1.245149 -1.565364 -1.896858
2 -2.204210 -0.073636 -0.457303 -0.547852 0.876874
3 1.021075 -1.227874 -0.792560 1.628169 -0.685461
4 0.699579 0.736821 1.143053 1.189183 1.553324
5 -2.166749 1.011902 -0.605816 1.184308 -0.427205
6 1.965086 -0.053822 0.100614 1.045595 -0.464474
7 2.385780 0.540920 0.790506 1.148555 -1.139325
8 -0.581308 -0.575956 0.285963 -0.535575 -0.195980
9 1.535928 0.927238 -0.513897 0.711812 1.172479
the code:
df3['Combo'] = df3[df3.columns].astype(str).apply(lambda x: ', '.join(x), axis = 1)
after
a b c d e Combo
0 0.870579 -1.356070 -0.169689 -0.148766 -1.520965 0.8705793801134621, -1.356070467974009, -0.169...
1 0.292316 -1.703772 -1.245149 -1.565364 -1.896858 0.29231630010496074, -1.7037715557607054, -1.2...
2 -2.204210 -0.073636 -0.457303 -0.547852 0.876874 -2.2042103265194823, -0.07363572968327593, -0....
3 1.021075 -1.227874 -0.792560 1.628169 -0.685461 1.0210749768623664, -1.227874362438, -0.792560...
4 0.699579 0.736821 1.143053 1.189183 1.553324 0.6995791505249452, 0.7368206760352145, 1.1430...
5 -2.166749 1.011902 -0.605816 1.184308 -0.427205 -2.166749201299601, 1.0119015881974436, -0.605...
6 1.965086 -0.053822 0.100614 1.045595 -0.464474 1.9650863537016798, -0.05382210788746324, 0.10...
7 2.385780 0.540920 0.790506 1.148555 -1.139325 2.3857802491033384, 0.5409195922501099, 0.7905...
8 -0.581308 -0.575956 0.285963 -0.535575 -0.195980 -0.5813081184052638, -0.5759559119431503, 0.28...
9 1.535928 0.927238 -0.513897 0.711812 1.172479 1.5359276629230108, 0.927237601422893, -0.5138...
Upvotes: 6
Reputation: 11192
Another solution would be,
df3.astype(str).values.flatten().tolist()
O/P:
['1.1298859039670908', '-1.1777990747688836', '-0.6863185575934238', '0.5728124523079394', '-1.7233889745416526', '1.2666884675345114', '-1.3370517489515568', '-1.1573192462004067', '-0.290889463035692', '0.7013992501326347', '-0.09235695278417168', '1.3398108023557909', '0.9348249877283498', '-1.420127356751191', '-0.23280615612717087', '-1.513041006340331', '0.06922064806964501', '0.5021357843647933', '0.4959105452630504', '0.23892842483496426', '0.332581693920347', '-0.9182302226268196', '0.4043812352905833', '1.2214146329445081', '-1.875277093248708', '0.3102747423859147', '-0.12406718601423607', '0.5281816415364707', '-1.9067143330181668', '0.8256856659897251', '2.294853355922203', '0.43835574399588956', '-1.1421958903284741', '1.1281755826789093', '-1.6942129677694633', '2.0015273318589077', '0.22546177660127778', '0.8744192315520689', '0.9149788977962425', '0.03312768429116076', '-0.8790198630064502', '1.1123149455982901', '1.0360823000160735', '0.3897776338002864', '1.6653054797315376', '-0.7959569835943457', '0.48684356819991087', '-0.1753603906083526', '1.3546473604252465', '0.8654506220249256']
If quotes only required for each row use,
r = [' '.join(val) for val in df3.astype(str).values.tolist()]
O/P:
['0.3453242505851785 0.8361952965566127 1.2140062332333457 -0.8449248124906361 -0.6596860872608944', '-1.9416389611147358 -0.4633998192182761 1.3156114084151638 0.31541640373981894 0.10017585641945598', '0.019222312957353865 -0.11572754659609137 -0.7475957688634534 1.732958781671217 0.8924926838936247', '1.2809958570913833 -0.5157436785751306 -0.2568307974248332 1.6223279831092197 1.4686281000013306', '0.2487576796276271 0.8129564817069422 0.8887583094926109 -0.8716446795448696 0.3920966638278787', '0.8033846996636256 -0.6320480733526924 0.17875269847270434 -0.5659865172511531 0.2259891796497471', '-1.6220463818040864 0.690201620286483 -0.7124446718694878 -0.271001366710889 1.1809699288238422', '1.800615079476972 0.04891756117369832 -1.1063732305386178 0.13042352385167277 0.5329078065025347', '0.00021395065919010197 -0.6429306637453445 -0.4281903648631154 0.2640659501478122 -0.3906892322707482', '-0.4159606749623029 0.7992377301053033 -0.8126018881734699 -1.2516267025391803 -0.17085205523095087']
Upvotes: 12
Reputation: 17339
Use to_csv
:
df = pd.DataFrame(np.random.randn(10, 5),
columns=['a', 'b', 'c', 'd', 'e'])
df.to_csv(header=None, index=False).strip('\n').split('\n')
['-1.60092768589,-0.746496859432,0.662527724304,-0.677984969682,1.70656657572',
'-0.432306620615,-0.396499851892,0.564494290965,-1.01196068617,-0.630576490671',
'-3.28916785414,0.627240166663,-0.359262938883,0.344156143177,-0.911269843378',
'-0.272741450301,0.0594234886507,-2.72800253986,-0.821610087419,-0.0668212419497',
'0.303490090149,-1.61344483051,0.117046351282,-1.46936429231,-0.66018613208',
'-1.18157229705,-0.766519504863,0.386180129978,0.945274532852,-0.783459830884',
'-1.27118723107,-1.12478330038,-0.625470220821,-0.453053132109,0.0641830786961',
'-1.02657336234,-1.01556460318,0.445282883845,0.589873985417,-0.833648685855',
'0.742343897524,-1.69644542886,-1.03886940911,0.511317569685,1.87084848086',
'-0.159125435887,1.02522202275,0.254459603867,-0.487187861352,2.31900012693']
Note: this needs to be improved if you have \n
in your cells.
Upvotes: 21
Reputation: 29690
You could use pandas.DataFrame.to_string
with some optional arguments set to False and then split on newline characters to get a list of your strings. This feels a little dirty though.
x = df3.to_string(header=False,
index=False,
index_names=False).split('\n')
vals = [','.join(ele.split()) for ele in x]
print(vals)
Outputs:
['1.221365,0.923175,-1.286149,-0.153414,-0.005078', '-0.231824,-1.131186,0.853728,0.160349,1.000170', '-0.147145,0.310587,-0.388535,0.957730,-0.185315', '-1.658463,-1.114204,0.760424,-1.504126,0.206909', '-0.734571,0.908569,-0.698583,-0.692417,-0.768087', '0.000029,0.204140,-0.483123,-1.064851,-0.835931', '-0.108869,0.426260,0.107286,-1.184402,0.434607', '-0.692160,-0.376433,0.567188,-0.171867,-0.822502', '-0.564726,-1.084698,-1.065283,-2.335092,-0.083357', '-1.429049,0.790535,-0.547701,-0.684346,2.048081']
Upvotes: 41
Reputation: 862681
You can canvert DataFrame
to numpy.array
by values
and then generate strings
:
b = '\n'.join(','.join('%0.3f' %x for x in y) for y in df.values)
print (b)
-1.245,-0.397,-0.374,0.698,-0.057
-1.695,-1.593,0.992,-1.839,0.980
1.154,-0.322,-0.583,1.022,1.800
-1.705,0.148,-0.670,0.164,0.902
1.573,-1.082,-0.243,-1.190,0.832
2.535,-1.168,-0.258,-2.617,-0.766
1.990,0.607,-0.115,0.114,0.175
-0.652,0.245,-1.501,0.145,-0.079
-1.977,3.543,-0.454,1.697,-0.648
-0.756,0.561,-1.294,-0.747,-0.323
If need strings
in list
:
b = list(','.join('%0.3f' %x for x in y) for y in df.values)
print (b)
['-1.139,0.257,-1.132,-0.987,1.194', '0.799,-1.061,-1.073,-0.176,0.528', '0.527,0.333,-0.185,-0.496,0.115', '-1.567,0.268,-1.457,2.121,-0.065', '-0.854,-2.344,0.747,0.208,-0.403', '1.850,0.084,1.890,-1.458,0.427', '1.649,0.134,-2.314,1.618,0.658', '2.178,-0.823,-0.499,0.083,-0.269', '-0.781,-0.212,1.623,-0.053,0.436', '0.842,-0.167,1.914,-0.087,0.717']
Upvotes: 4