My Projects This Year
- bsample301
- Jul 9
- 7 min read
I like to keep this site more about my work with hockey, however I thought this would be a great space to go into further detail about what I have accomplished with MoeAnalytics this season.
Baseball RPI & SOS
RPI (Rating Percentage Index) is a great way to rank teams in sports, it is mainly used in college sports like basketball and baseball.
Earlier this year I stumbled upon the website warrennolan.com which has many different college sports rankings on it, but the one I was most interested in was its college baseball rankings, and its way to get the schedule and scores. A big part of this website is its RPI rankings for its baseball, and I was influenced by this to try and see if I could make this for high school baseball in Ohio.
Another reason why I wanted to try and do this was because of the way that high school sports in Ohio function. Of the big 4 sports (football, basketball, hockey, baseball) baseball is the only sport that does not use a formula to determine its postseason tournament. Football uses the formula developed by the OHSAA called the Football Computer Rankings. It can also be seen live by Joe Eitel on his website.
Basketball has started to use the RPI developed by the OHSAA, in coordination with MaxPreps. And finally, just this last season, hockey has started to use the computerized rankings from MaxPreps, instead of the traditional coaches voting.
Baseball has not implemented a system like this before. They continue to do the traditional coaches voting to determine its seeds and matchups, which for the most part have proved to go well.
But, while coaches rankings and votes are a great way to see how good teams rank compared to each other, I wanted to see what the data says.
I ran into multiple problems while trying to conduct this experiment.
Problem #1: How do I even do this?
I have very little coding experience, so my only option to conduct this was in Excel or Google Sheets, which is not a bad thing as I have a lot of experience in that field.
If I were going to do this, I would want to have to find a way where the data would automatically update for two main reasons.
The first reason is that pretty much the only place to get all this data that I need is in MaxPreps. MaxPreps is an amazing site to use and it has a lot of information, but from what I have learned from previous experience is that it is very difficult to get data from. Also, you can copy and paste a team’s schedule from its site, but it pastes in a very wonky format and it is a pain to be able to go through all the schedules to get it right.
The second reason is that, even if I was able to copy and paste the schedules in a nice format, copying and pasting 100+ schedules into an Excel document takes up a lot of time.
But one day when looking up some things, I found my answer: the importhtml formula.
This formula was incredibly helpful. It automatically updated the page that I linked it to after an edit, and it imported the schedules in a very nice format which made it incredibly easy to use. This formula is only available in Google Sheets, so it meant that Sheets was the thing that I would use for this project.
This made it very easy to use and get the schedules for all the teams that I needed, but with the amount of importhtml’s that I had, it did take some time to load all of them, but it was a lot easier than getting all of them myself.
Problem #2: What teams do I even use?
For this project, I knew I wanted to only get the RPI of D1 teams to start, so that if I wanted to expand later on in the future, I could increase what divisions I wanted, but I could start in D1 for now.
This past season the OHSAA restructured baseball to have 7 divisions instead of 4. This made D1 much smaller, leaving them with only 65 teams, which made it nice for the sheet.
However, D1 teams play many games against non-D1 opponents, especially in Northern Ohio where there are less D1 teams.
My solution was to only include D1 and D2 teams in my formula. A main reason for this was that there were only 64 teams in D2, but in D3 there were 125. If I also included D3 teams, it would be more accurate, but I don’t know if the sheet would be able to handle that. So I just kept it to D2 to start with, until I find a better way to formulate it.
All that left me with was to keep up with it during the season, and get the results.
For the RPI formula, I just used the basic formula: 0.25*(Team’s Winning Percentage)+0.50*(Opponents’ Winning Percentage)+0.25*(Opponents’ Opponents’ Winning Percentage)
A lot of people do the average Winning Percentage of the opponents, but I combined them and did the total Winning Percentage.
I also calculated the SOS (Strength of Schedule) of the D1 teams, like I did in a project last year (Tweet 1, Tweet 2, Tweet 3). For SOS, I just used the Opponent Winning Percentage to get it, nothing fancy.
Here are my RPI results:
1 JACKSON
2 ST IGNATIUS
3 ARCHBISHOP MOELLER
4 GLENOAK
5 PERRYSBURG
6 OLENTANGY
7 FINDLAY
8 OLENTANGY ORANGE
9 OLENTANGY BERLIN
10 OLENTANGY LIBERTY
11 MEDINA
12 McKINLEY
13 MENTOR
14 ELDER
15 MARYSVILLE
16 LITTLE MIAMI
17 LAKOTA WEST
18 HILLIARD DARBY
19 CENTERVILLE
20 ST EDWARD
21 BEAVERCREEK
22 BRUNSWICK
23 THOMAS WORTHINGTON
24 MASON
25 LEBANON
26 ST XAVIER
27 LANCASTER
28 GROVE CITY
29 SPRINGBORO
30 STRONGSVILLE
31 LAKOTA EAST
32 PRINCETON
33 WESTERN HILLS
34 UPPER ARLINGTON
35 HILLIARD DAVIDSON
36 BEREA-MIDPARK
37 PICKERINGTON CENTRAL
38 NEWARK
39 MILFORD
40 JOHN MARSHALL
41 WEST CLERMONT
42 DUBLIN JEROME
43 PICKERINGTON NORTH
44 FAIRMONT
45 HILLIARD BRADLEY
46 OAK HILLS
47 WHITMER
48 HAYES
49 HAMILTON
50 SYCAMORE
51 WESTLAND
52 DUBLIN COFFMAN
53 WALNUT HILLS
54 COLERAIN
55 ELYRIA
56 LINCOLN
57 SPRINGFIELD
58 WAYNE
59 REYNOLDSBURG
60 CLEVELAND HEIGHTS
61 FAIRFIELD
62 CENTRAL CROSSING
63 MIDDLETOWN
64 LORAIN
65 GROVEPORT MADISON
Here are my SOS Results (ranked by hardest schedule first):
1 ST IGNATIUS
2 WHITMER
3 PERRYSBURG
4 McKINLEY
5 MARYSVILLE
6 DUBLIN JEROME
7 ST EDWARD
8 ELDER
9 CENTERVILLE
10 HILLIARD DARBY
11 ST XAVIER
12 WALNUT HILLS
13 ARCHBISHOP MOELLER
14 OLENTANGY
15 BEREA-MIDPARK
16 FINDLAY
17 GLENOAK
18 JACKSON
19 WESTERN HILLS
20 ELYRIA
21 CLEVELAND HEIGHTS
22 THOMAS WORTHINGTON
23 OLENTANGY BERLIN
24 DUBLIN COFFMAN
25 UPPER ARLINGTON
26 OLENTANGY ORANGE
27 FAIRFIELD
28 MIDDLETOWN
29 STRONGSVILLE
30 FAIRMONT
31 HILLIARD DAVIDSON
32 WEST CLERMONT
33 OLENTANGY LIBERTY
34 LAKOTA EAST
35 BRUNSWICK
36 SYCAMORE
37 SPRINGFIELD
38 PICKERINGTON CENTRAL
39 LAKOTA WEST
40 MILFORD
41 WAYNE
42 HILLIARD BRADLEY
43 CENTRAL CROSSING
44 LEBANON
45 PICKERINGTON NORTH
46 LITTLE MIAMI
47 NEWARK
48 GROVE CITY
49 JOHN MARSHALL
50 MENTOR
51 MEDINA
52 REYNOLDSBURG
53 BEAVERCREEK
54 HAYES
55 COLERAIN
56 MASON
56 PRINCETON
58 GROVEPORT MADISON
59 LINCOLN
60 SPRINGBORO
61 HAMILTON
62 LANCASTER
63 WESTLAND
64 OAK HILLS
65 LORAIN
I also decided to do the RPI rankings of the conferences of the D1 teams:
1 Federal
2 Independent
3 NLL BUCK
4 GCL
5 OCC CARD
6 GCC
7 OCC CEN
8 ECC
9 CMAC
10 GWOC
11 SAL
12 GMC
13 SWC
14 OCC OH
15 OCC BUCK
16 OCC CAP
17 Lake Erie
Doing the RPI project allowed me to look at how individual teams are doing as well:

Park Factor
Park Factor was the 2nd project that I did this season. I did this before the season, back in February.
Park Factor is a way to see how hitter or pitcher friendly a park is, cause as we all know, not all parks are created equally.
Here is the formula for Park Factor:

100 is neutral, below 100 is a pitcher friendly park, and above 100 is a hitter friendly park.
I collected data for 40 teams and their fields around the Cincinnati and Dayton area.
I collected data for 3 seasons, so at the time it was between 2022-2024, here are my results:
1 Anderson Anderson High School
2 Princeton Princeton Baseball Fields
3 Badin Joyce Park
4 Milford Milford High School
5 Lakota East Baseball Field
6 Wayne Wayne High School
7 St. Xavier Baseball Stadium
8 Fenwick Fenwick High School
9 Winton Woods Winton Woods High School
10 Centerville Booster Park
11 West Clermont West Clermont High School
12 Oak Hills Oak Hills High School
13 Carroll Carroll High School
14 Fairmont Fairmont Park
15 Sycamore Sycamore High School
16 McNicholas Paradise Athletic Complex
17 Lakota West Firebird Field
18 Chaminade Julienne Howell Field
19 Miamisburg Toadvine Field
20 Alter Nischwitz Stadium
21 Springboro Lundt Baseball Field
22 Colerain Colerain High School
23 Lebanon Lebanon Junior High School
24 Kings Kings High School
25 Moeller Kremchek Stadium
26 Hamilton Hamilton High School
27 LaSalle Lancer Baseball Field
28 CHCA Robert Gardner Baseball Stadium
29 Turpin Turpin High School
30 Northmont Northmont High School
31 Walnut Hills Reds Urban Youth Academy
32 Middletown Lefferson Park
33 Mason Mason Middle School
34 Fairfield Joe Nuxhall Field
35 Beavercreek Mark Stewart Field
36 Loveland Dave Evans Field
37 Vandalia Butler Vandalia Butler High School
38 Little Miami Little Miami High School
39 Springfield Springfield High School
40 Elder Panther Athletic Complex
CJ, Miamisburg, and Alter are the closest to 100 that I have, so anything above those are hitter friendly, and anything below are pitcher friendly.
Comments