Pandas df.to_csv but data missing?

Question

run the following code:

df.info()
df.to_csv(file,index=False,sep=',',encoding='utf_8_sig')
df=pd.DataFrame(pd.read_csv(file,encoding='utf_8_sig'))
df.info()

And the output is:


RangeIndex: 6723 entries, 0 to 6722
Data columns (total 24 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   cmt_id          6723 non-null   int64  
 1   info_id         6723 non-null   int64  
 2   cmt_user_id     6723 non-null   int64  
 3   publish_time    6723 non-null   object 
 4   cmt_time        6723 non-null   object 
 5   diff_time       6723 non-null   float64
 6   platform        6723 non-null   object 
 7   post            6723 non-null   object 
 8   cmt_content     6723 non-null   object 
 9   post_view       4098 non-null   object 
 10  post_like       6723 non-null   object 
 11  post_dislike    6723 non-null   object 
 12  post_cmt        6559 non-null   object 
 13  post_repost     1965 non-null   object 
 14  cmt_like        5630 non-null   object 
 15  cmt_dislike     5630 non-null   object 
 16  cmt_reply       4336 non-null   object 
 17  cmt_repost      4460 non-null   object 
 18  user_gender     3091 non-null   object 
 19  user_score      4572 non-null   object 
 20  user_post_star  4048 non-null   object 
 21  user_reply      993 non-null    object 
 22  user_post       4572 non-null   object 
 23  user_friend     993 non-null    object 
dtypes: float64(1), int64(3), object(20)
memory usage: 1.2+ MB

RangeIndex: 6723 entries, 0 to 6722
Data columns (total 24 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   cmt_id          6723 non-null   int64  
 1   info_id         6723 non-null   int64  
 2   cmt_user_id     6723 non-null   int64  
 3   publish_time    6723 non-null   object 
 4   cmt_time        6723 non-null   object 
 5   diff_time       6723 non-null   float64
 6   platform        6723 non-null   object 
 7   post            6419 non-null   object 
 8   cmt_content     6723 non-null   object 
 9   post_view       4098 non-null   float64
 10  post_like       2081 non-null   float64
 11  post_dislike    1767 non-null   float64
 12  post_cmt        6544 non-null   float64
 13  post_repost     1446 non-null   float64
 14  cmt_like        0 non-null      float64
 15  cmt_dislike     0 non-null      float64
 16  cmt_reply       0 non-null      float64
 17  cmt_repost      124 non-null    float64
 18  user_gender     2622 non-null   object 
 19  user_score      4065 non-null   float64
 20  user_post_star  3579 non-null   float64
 21  user_reply      519 non-null    float64
 22  user_post       4103 non-null   float64
 23  user_friend     148 non-null    float64
dtypes: float64(15), int64(3), object(6)
memory usage: 1.2+ MB

It’s obvious to see some data missing in a few columns when I use Pandas to save my dataframe. There might be some bugs regards to the df.to_csv() function, cuz data is also missing in the csv file. (some online references describe issues about encoding, but that don't work in my settings) Why would that happen? How can I fix it?

Pandas df.to_csv but data missing?

Answers (1)

Related Questions