Esteban
Esteban

Reputation: 179

How to create a dataframe from .txt file with headers mixed in the rows, without iteration?

Sorry if the title is a little confusing, but the problem is the following:

Say I have a .txt file that looks like this,

Bob Sales
12
33
45
Sam Sales
23
Wendy Sales
12
33
45
64
54

And I want my dataframe to be,

Bob    12
Bob    33
Bob    45
Sam    23
Wendy  12
Wendy  33
Wendy  45
Wendy  64
Wendy  54

What's the best way to import the .txt file to create the pandas dataframe without iterating line by line? (I'm aware iteration would easily solve this, but I'm curious if it's possible without iteration).

Upvotes: 1

Views: 278

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

IIUC:

from io import StringIO
txt = StringIO("""Bob Sales
12
33
45
Sam Sales
23
Wendy Sales
12
33
45
64
54""")

df = pd.read_csv(txt, header=None, sep='\s\s+', engine='python')

df[1] = df[0].str.extract('([a-zA-Z ]+)').ffill()
df_out = df[df[0] != df[1]]
print(df_out)

Output:

     0            1
1   12    Bob Sales
2   33    Bob Sales
3   45    Bob Sales
5   23    Sam Sales
7   12  Wendy Sales
8   33  Wendy Sales
9   45  Wendy Sales
10  64  Wendy Sales
11  54  Wendy Sales

Details: using regex look for a pattern create a new column with only rows with that pattern, use ffill to replicate the previous value down in that new column. Next, filter the dataframe where the original column is not equal to the new column. ​

Upvotes: 3

Related Questions