I myself am not a researcher in AI, however, I work in interactive theorem provers, which is a hot-topic in the AI world, and thus interact with a lot of people and companies in the AI sphere.
Really since the start of the AI boom there has been discussions about companies’ use of data to train their models, whether that be reddit posts, books, copy-righted works of art, or closer to home for me academic papers or formalized definitions and proofs. The law sides with the AI companies here, saying that the use of such training data is “fair-use”. However, there is another question here: is it academic misconduct not to cite the training data you (or someone else) used to train the AI model you used to help write your paper, or come up with your theory? I think many would fall back to the above “fair-use” ruling on this and say no. However, citations have nothing to do with copyright. They are about crediting work that has been done, make it easier to set precedents and have been so fundamental to scientific progress that it is drilled into students from very early on.
It is my opinion that if you writing an academic paper or coming up with a theory with the help of an AI, and you are not citing the training data you (or the AI makers) used, you are not working to the level of academic integrity that for some many decades with have prescribed to.
The real problem is that asking for this level of academic integrity takes away many AI companies profit mechanisms.
(My views here are totally my own, and do not represent any position of projects I work on etc.)