Reputation: 179
Sorry if the title is a little confusing, but the problem is the following:
Say I have a .txt file that looks like this,
Bob Sales
12
33
45
Sam Sales
23
Wendy Sales
12
33
45
64
54
And I want my dataframe to be,
Bob 12
Bob 33
Bob 45
Sam 23
Wendy 12
Wendy 33
Wendy 45
Wendy 64
Wendy 54
What's the best way to import the .txt file to create the pandas dataframe without iterating line by line? (I'm aware iteration would easily solve this, but I'm curious if it's possible without iteration).
Upvotes: 1
Views: 278
Reputation: 153500
IIUC:
from io import StringIO
txt = StringIO("""Bob Sales
12
33
45
Sam Sales
23
Wendy Sales
12
33
45
64
54""")
df = pd.read_csv(txt, header=None, sep='\s\s+', engine='python')
df[1] = df[0].str.extract('([a-zA-Z ]+)').ffill()
df_out = df[df[0] != df[1]]
print(df_out)
Output:
0 1
1 12 Bob Sales
2 33 Bob Sales
3 45 Bob Sales
5 23 Sam Sales
7 12 Wendy Sales
8 33 Wendy Sales
9 45 Wendy Sales
10 64 Wendy Sales
11 54 Wendy Sales
Details: using regex look for a pattern create a new column with only rows with that pattern, use ffill to replicate the previous value down in that new column. Next, filter the dataframe where the original column is not equal to the new column.
Upvotes: 3