Reputation: 1
I am extracting a string line to put it into csv with precise columns. The data looks like this:
Andrew Taggart 12345678 Math: 90 English: 78 Physics: 85
Jame Bond 1012478 English: 97 Physics: 85 Chemistry: 76
Hope Williams 1478978 Math: 89 English: 85 Physics: 76
and I want the output look like this
Name, Student_ID, Math, English, Physics, Chemistry
Andrew Taggart, 12345678, 90, 78, 85, -1
Jame Bond, 1012478, -1, 97, 85, 76
Hope Williams, 1478978, 89, 85, 76, -1
the format will be:
name|subject1|subject2|subject3
name1|23|34|23
name2|3|2|5
Note that after each columns there will be specified by "," and if student don't have grade on that subject it will be (-1) - like it is for Chemistry.
Here is my code so far.
import re
student = "Andrew Taggart 12345678 Math: 90 English: 78 Physics: 85"
name = re.split (r'(\d+)', student) [0] #extract name
-> Output: Andrew Taggart
ID = re.split (r'(\d+)', student) [1] #extract ID
-> Output: 12345678
headers = ["Math", "English", "Physics", "Chemistry"]
grade = re.findall(r"[-+]?\d*\.\d+|\d+", student) [1:]
-> Output: ["90", "78", "85"]
student_grade = list (i.strip().replace(":", "") for i in re.split (r"(\d+.\d+|\d+')", student)) [2:-1]
-> Output: ["Math", "90", "English", "78", "Physics", "85"]
Upvotes: 0
Views: 226
Reputation: 996
This does not return it the way you want, but may help.
formatted = {}
for word in data.split():
try:
num = int(word)
formatted.update({key: num})
key=""
except ValueError as ex:
if key == "":
key = word
else:
key += f" {word}"
And the output is such as:
{'Andrew Taggart': 12345678, 'Math:': 90, 'English:': 78, 'Physics:': 85}
You can easily use this format to create a csv.
----EDIT------
result = []
data = """Andrew Taggart 12345678 Math: 90 English: 78 Physics: 85
Jame Bond 1012478 English: 97 Physics: 85 Chemistry: 76
Hope Williams 1478978 Math: 89 English: 85 Physics: 76"""
for line in data.split("\n"):
student = True
form = {}
for word in line.split():
try:
num = int(word)
if student:
form.update({"name": key, "student_id": num})
student, key = False, ""
continue
form.update({key: num})
key=""
except ValueError as ex:
if key == "":
key = word
continue
key += f" {word}"
result.append(form)
Then just do:
import pandas as pd
pd.DataFrame(result).fillna(-1)
And you get:
name student_id Math: English: Physics: Chemistry:
0 Andrew Taggart 12345678 90.0 78 85 -1.0
1 Jame Bond 1012478 -1.0 97 85 76.0
2 Hope Williams 1478978 89.0 85 76 -1.0
Upvotes: 1