Economist_Ayahuasca
Economist_Ayahuasca

Reputation: 1642

saving data-frame into csv file Python 3

I have the following data-set:

>>topic_article_Dists = pandas.DataFrame(topicDists)
>>topic_article_Dists.head(10)

                   0          ...                            19
0  (0, 0.00012594461)         ...           (19, 0.00012594461)
1  (0, 0.00013192612)         ...           (19, 0.00013192612)
2  (0, 0.00018656717)         ...             (19, 0.004974284)
3  (0, 0.00012594466)         ...           (19, 0.00012594466)
4    (0, 0.024151485)         ...           (19, 9.2936825e-05)
5  (0, 0.00013262601)         ...           (19, 0.00013262601)
6  (0, 0.00018796993)         ...            (19, 0.0050261705)
7  (0, 0.00026737968)         ...           (19, 0.00026737968)
8  (0, 0.00013698627)         ...           (19, 0.00013698627)
9  (0, 0.00029239763)         ...           (19, 0.00029239766)

I would like to save (in a CVS file) for each column only the number after the comma to obtain the following outcome:

              0          ...                      19
0  0.00012594461         ...           0.00012594461
1  0.00013192612         ...           0.00013192612
2  0.00018656717         ...           0.004974284
3  0.00012594466         ...           0.00012594466
4  0.024151485           ...           9.2936825e-05
5  0.00013262601         ...           0.00013262601
6  0.00018796993         ...           0.0050261705
7  0.00026737968         ...           0.00026737968
8  0.00013698627         ...           0.00013698627
9  0.00029239763         ...           0.00029239766

I have tried with this command. And I am wondering if I should use regular expressions to do the job.

topic_article_Dists.to_csv("Article-Topic-Distri.csv")   

Upvotes: 1

Views: 78

Answers (1)

jezrael
jezrael

Reputation: 863411

Use concat with list comprehension and select second values of tuples by indexing:

#import ast

#print (type(df.iloc[0,0]))
#<class 'str'>

#if necessary
#df = df.applymap(ast.literal_eval)

print (type(df.iloc[0,0]))
<class 'tuple'>

df = pd.concat([df[x].str[1] for x in df.columns], axis=1)
print (df)
          0        19
0  0.000126  0.000126
1  0.000132  0.000132
2  0.000187  0.004974
3  0.000126  0.000126
4  0.024151  0.000093
5  0.000133  0.000133
6  0.000188  0.005026
7  0.000267  0.000267
8  0.000137  0.000137
9  0.000292  0.000292

If want working with strings:

print (type(df.iloc[0,0]))
<class 'str'>

df = pd.concat([df[x].str.split(',').str[1].str.rstrip(')') for x in df.columns], axis=1)

Upvotes: 1

Related Questions