the basic rules of music that sounds ok:

1) in two nearby octaves no near notes (as if they were on the same octave), so sound waves would not clutch together. this why people invented scales to make sure sounds are at last a tone apart (at a distance of one key skip between keys on the piano) . so there would not be half-tone (no distance)(next key on the piano) combinations during two nearby scales. reason for tone distance is it is hard to sing and distinguish nearby notes when they are near. scales are systematic skipping of notes.

2) at end “resolve” it means: at the end get back to nearest tonic sound on the scale. (maybe up or maybe down). parts of a scale are: 1st note on the scale is tonic (pitch of scale), -1 on the scale is alternative-tonic. 3 is not-dominant 2 is alternative-dominant, 4 is alternative-dominant 5 is dominant 6 is a middle-dominant – dominant and not dominant.

simple music is a trip from [tonic or tonic-alternative (may be chosen randomly)] through a long trip of a mix of [not-dominant, and middle-dominant. sprinkled with few dominants from time to time like series of combinations of dominant and tonic multiple times] after that go to nearest to tonic the shortest route as passsible, maybe up or maybe down.

*also maybe to model the teasing, how and fast it to gets to excitement – to the dominant.

the off-scale sounds add: stress. or roughness or texture or uniqueness to music.

3) use combinations of chord inversions (and chords) to enhance the acoustic range of the music. usually chosen to accent higher pitches or lower pitches or by melody or as a way to get back to tonic.

a chord is a tonic, not-dominant and dominant, like 1, 3, 5 on the scale. all together. , there is a circle of 5ths: sometimes C chord becomes G chord.

the basic scale to experiment is the black keys and a key before the 3 black keys and key after.

music is scale independent, relative to the tonic,
so you need a model that will enable to model it correctly. i guess something convolutional with something from language processing, might work. also with regularization to not pick too distant notes.

when someone invents music it usually variation around and consists of a catchy phrase. a catchy phrase is uniquely distinguishable and recognizable rhythm pattern. and uniquely distinguishable and recognizable pitch pattern. painted on top of the rhythm.

so music is like musical-composition-behaviors that are temporal patterns to be recognized, like words and like sentences are layers of combinations of those patterns, like parallel lstm that is generative.