erdemgunenc
erdemgunenc

Reputation: 997

Python pandas show repeated values

I'm trying to get data from txt file with pandas.read_csv but it doesn't show the repeated(same) values in the file such as I have 2043 in the row but It shows it once not in every row.

My file sample

enter image description here

Result set

enter image description here

All the circles I've drawn should be 2043 also but they are empty.

My code is :

import pandas as pd

df= pd.read_csv('samplefile.txt', sep='\t', header=None,
               names = ["234",  "235",  "236"]

Upvotes: 1

Views: 380

Answers (2)

slackline
slackline

Reputation: 2417

A word of warning with MultiIndex as I was bitten by this yesterday and wasted time trying to trouble shoot a non-existant problem.

If one of your index levels is of type float64 then you may find that the indexes are not shown in full. I had a dataframe I was df.groupby().describe() and the variable I was performing the groupby() on was originally a long int, at some point it was converted to a float and when printing out this index was rounded. There were a number of values very close to each other and so it appeared on printing that the groupby() had found multiple levels of the second index.

Thats not very clear so here is an illustrative example...

import numpy as np
import pandas as pd

index = np.random.uniform(low=89908893132829,
                          high=89908893132929,
                          size=(50,))
df = pd.DataFrame({'obs': np.arange(100)},
                  index=np.append(index, index)).sort_index()
df.index.name = 'index1'
df['index2'] = [1, 2] * 50
df.reset_index(inplace=True)
df.set_index(['index1', 'index2'], inplace=True)

Look at the dataframe and it appears that there is only one level of index1...

df.head(10)
                     obs
index1       index2     
8.990889e+13 1         4
             2        54
             1        61
             2        11
             1        89
             2        39
             1        65
             2        15
             1        60
             2        10

groupby(['index1', 'index2']).describe() and it looks like there is only one level of index1...

summary = df.groupby(['index1', 'index2']).describe()
summary.head()
                      obs                                        
                    count  mean std   min   25%   50%   75%   max
index1       index2                                              
8.990889e+13 1        1.0   4.0 NaN   4.0   4.0   4.0   4.0   4.0
             2        1.0  54.0 NaN  54.0  54.0  54.0  54.0  54.0
             1        1.0  61.0 NaN  61.0  61.0  61.0  61.0  61.0
             2        1.0  11.0 NaN  11.0  11.0  11.0  11.0  11.0
             1        1.0  89.0 NaN  89.0  89.0  89.0  89.0  89.0

But if you look at the actual values of index1 in either you see that there are multiple unique values. In the original dataframe...

df.index.get_level_values('index1')

Float64Index([89908893132833.12, 89908893132833.12, 89908893132834.08,
              89908893132834.08, 89908893132835.05, 89908893132835.05,
               89908893132836.3,  89908893132836.3, 89908893132837.95,
              89908893132837.95,  89908893132838.1,  89908893132838.1,
               89908893132838.6,  89908893132838.6, 89908893132841.89,
              89908893132841.89, 89908893132841.95, 89908893132841.95,
              89908893132845.81, 89908893132845.81, 89908893132845.83,
              89908893132845.83, 89908893132845.88, 89908893132845.88,
              89908893132846.02, 89908893132846.02,  89908893132847.2,
               89908893132847.2, 89908893132847.67, 89908893132847.67,
               89908893132848.5,  89908893132848.5,  89908893132848.5,
               89908893132848.5, 89908893132855.17, 89908893132855.17,
              89908893132855.45, 89908893132855.45, 89908893132864.62,
              89908893132864.62, 89908893132868.61, 89908893132868.61,
              89908893132873.16, 89908893132873.16,  89908893132875.6,
               89908893132875.6, 89908893132875.83, 89908893132875.83,
              89908893132878.73, 89908893132878.73,  89908893132879.9,
               89908893132879.9, 89908893132880.67, 89908893132880.67,
              89908893132880.69, 89908893132880.69, 89908893132881.31,
              89908893132881.31, 89908893132881.69, 89908893132881.69,
              89908893132884.45, 89908893132884.45, 89908893132887.27,
              89908893132887.27, 89908893132887.83, 89908893132887.83,
               89908893132892.8,  89908893132892.8, 89908893132894.34,
              89908893132894.34,  89908893132894.5,  89908893132894.5,
              89908893132901.88, 89908893132901.88, 89908893132903.27,
              89908893132903.27, 89908893132904.53, 89908893132904.53,
              89908893132909.27, 89908893132909.27, 89908893132910.38,
              89908893132910.38, 89908893132911.86, 89908893132911.86,
               89908893132913.4,  89908893132913.4, 89908893132915.73,
              89908893132915.73, 89908893132916.06, 89908893132916.06,
              89908893132922.48, 89908893132922.48, 89908893132923.44,
              89908893132923.44, 89908893132924.66, 89908893132924.66,
              89908893132925.14, 89908893132925.14, 89908893132928.28,
              89908893132928.28],
             dtype='float64', name='index1')

...and in the summarised dataframe...

summary.index.get_level_values('index1')

Float64Index([89908893132833.12, 89908893132833.12, 89908893132834.08,
              89908893132834.08, 89908893132835.05, 89908893132835.05,
               89908893132836.3,  89908893132836.3, 89908893132837.95,
              89908893132837.95,  89908893132838.1,  89908893132838.1,
               89908893132838.6,  89908893132838.6, 89908893132841.89,
              89908893132841.89, 89908893132841.95, 89908893132841.95,
              89908893132845.81, 89908893132845.81, 89908893132845.83,
              89908893132845.83, 89908893132845.88, 89908893132845.88,
              89908893132846.02, 89908893132846.02,  89908893132847.2,
               89908893132847.2, 89908893132847.67, 89908893132847.67,
               89908893132848.5,  89908893132848.5, 89908893132855.17,
              89908893132855.17, 89908893132855.45, 89908893132855.45,
              89908893132864.62, 89908893132864.62, 89908893132868.61,
              89908893132868.61, 89908893132873.16, 89908893132873.16,
               89908893132875.6,  89908893132875.6, 89908893132875.83,
              89908893132875.83, 89908893132878.73, 89908893132878.73,
               89908893132879.9,  89908893132879.9, 89908893132880.67,
              89908893132880.67, 89908893132880.69, 89908893132880.69,
              89908893132881.31, 89908893132881.31, 89908893132881.69,
              89908893132881.69, 89908893132884.45, 89908893132884.45,
              89908893132887.27, 89908893132887.27, 89908893132887.83,
              89908893132887.83,  89908893132892.8,  89908893132892.8,
              89908893132894.34, 89908893132894.34,  89908893132894.5,
               89908893132894.5, 89908893132901.88, 89908893132901.88,
              89908893132903.27, 89908893132903.27, 89908893132904.53,
              89908893132904.53, 89908893132909.27, 89908893132909.27,
              89908893132910.38, 89908893132910.38, 89908893132911.86,
              89908893132911.86,  89908893132913.4,  89908893132913.4,
              89908893132915.73, 89908893132915.73, 89908893132916.06,
              89908893132916.06, 89908893132922.48, 89908893132922.48,
              89908893132923.44, 89908893132923.44, 89908893132924.66,
              89908893132924.66, 89908893132925.14, 89908893132925.14,
              89908893132928.28, 89908893132928.28],
             dtype='float64', name='index1')

I wasted time scratching my head wondering why my groupby([index1,index2) had produced only one level of index1!

Upvotes: 1

jezrael
jezrael

Reputation: 862691

You get MultiIndex, so first level value are not shown only.

You can convert MultiIndex to columns by reset_index:

df = df.reset_index()

Or specify each column in parameter names for avoid MultiIndex:

df = pd.read_csv('samplefile.txt', sep='\t', names = ["one","two","next", "234", "235", "236"]

Upvotes: 4

Related Questions