How to create a dataframe column with repeated string value?

Question

I'm reading in data from a bunch of files and storing it in a data frame. I want a column of the data frame to indicate which file the data came from. How do I create a column that has the same string repeated over and over without typing it out manually?

Each file I'm reading in has ~100 data points (but not the same number each time). As I read each one in, I will concat to the dataframe along axis=0. It should look like this.

import numpy as np
import pandas as pd
numbers = np.random.randn(5) # this data could be of any length, ~100
labels = np.array(['file01','file01','file01','file01','file01']) 
tf = pd.DataFrame()
tf['labels'] = labels
tf['numbers'] = numbers

In [8]: tf
Out[8]: 
   labels   numbers
0  file01 -0.176737
1  file01 -1.243871
2  file01  0.154886
3  file01  0.236653
4  file01 -0.195053

(Yes, I know I could make 'file01' a column header and append each one along axis=1, but there are reasons I don't want to do it that way.)

Flavian Hautbois · Accepted Answer

There you go, your code is fixed! You can actually put a single value in the dict used in the DataFrame constructor :).

import numpy as np
import pandas as pd
filename = 'file01'
numbers = np.random.randn(5) # this data could be of any length, ~100
tf = pd.DataFrame({'labels': filename , 'numbers': numbers})

In [8]: tf
Out[8]: 
   labels   numbers
0  file01 -0.176737
1  file01 -1.243871
2  file01  0.154886
3  file01  0.236653
4  file01 -0.195053

How to create a dataframe column with repeated string value?

Answers (1)

Related Questions